Best VPS Hosting | Website Hosting Pakistan | Domain & Hosting Services

Often while login to the Nagios XI web interface, users cannot log in, even though they can connect to the Nagios XI server via SSH.

As a part of our Server Management Services, we help our Customers to fix Nagios related errors regularly.

Let us today discuss the possible causes and fixes for this error.

Causes for Cannot login to Nagios XI web interface

Some of the common reasons that make us unable to login to the Nagios XI web interface include:

  • Wrong/Lost password for the nagiosadmin user.
  • SELinux enabled.
  • The apache service is not running.
  • The firewall is blocking port 80.
  • The mysqld service is not running or there are crashed database tables.
  • The postgresql service is not running or the database is not accepting commands
  • Other products installed that use Postgres may need their databases vacuumed

Let us now look at the solutions for each of this one by one

 

Solutions for  “Cannot login to Nagios XI web interface”

Wrong/Lost password for the nagiosadmin user

The most common reason for the error while login to the web interface is that we are using a wrong password for nagiosadmin.

We will need to rest the nagiosadmin’s password in such cases.

For this, open an SSH or direct console session to the Nagios XI host and execute the following command:

/usr/local/nagiosxi/scripts/reset_nagiosadmin_password.php --password=newpassword

 

SELinux enabled

We will need the policycoreutils tool installed to check whether SELinux is disabled. Use the following command to install it:

RHEL|CentOS|Oracle Linux
# yum install -y policycoreutils

Debian | Ubuntu
# apt-get install -y policycoreutils

Now check the status of SELinux by running one of these commands:

sestatus
OR
getenforce

To disable SELinux, run:

setenforce 0
sed -i 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/selinux/config

Running the above commands will turn off SELinux immediately AND make the change remain after a server reboot.

 

The apache service is not running

If apache is not running, we will an error messages in the web UI like:

Firefox
"Unable to connect"

Chrome
"This webpage is not available"

Here, first, check the apache status using one of the commands below:

RHEL 6|CentOS 6|Oracle Linux 6
# service httpd status

RHEL 7|CentOS 7|Oracle Linux 7
# systemctl status httpd.service

Ubuntu 14
# service apache2 status

Debian | Ubuntu 16/18
# systemctl status apache2.service

To start/restart the apache service, run:

RHEL 6|CentOS 6|Oracle Linux 6
# service httpd start
or
# service httpd restart

RHEL 7|CentOS 7|Oracle Linux 7
# systemctl start httpd.service
or
# systemctl restart httpd.service

Ubuntu 14
# service apache2 start
or
# service apache2 restart

Debian|Ubuntu 16/18
# systemctl start apache2.service
or
# systemctl restart apache2.service

 

The firewall is blocking port 80

Let us now check the steps to unblock the port in iptables firewall. First, check the status of the firewall:

# service iptables status

Cross-check the output for a line as given below that tells us that the firewall rule exists and allows inbound TCP traffic on port 80:

5 ACCEPT tcp -- 0.0.0.0/0 0.0.0.0/0 state NEW tcp dpt:80

If this firewall rule DOES NOT exist, then it can be added by executing the following commands:

# iptables -I INPUT -p tcp --dport 80 -j ACCEPT
# service iptables save

Ubuntu uses the Uncomplicated Firewall (ufw) to manage firewall rules however it is not enabled on a default install.

We can check if it is enabled with the following command:

# ufw status

If the status says it is inactive, we can enable the firewall on boot and start it using the command below:

# ufw enable

Now we need to add rules for different ports as the default configuration denies all incoming connections.

To list the firewall rules execute this command:

# ufw status verbose

Which should produce output like:

Status: active
Logging: on (low)
Default: deny (incoming), allow (outgoing), disabled (routed)
New profiles: skip

To Action From
-- ------ ----
80 ALLOW IN Anywhere
80 (v6) ALLOW IN Anywhere (v6)

We can see from the output that firewall rules exist allowing inbound TCP traffic on port 80.

If this firewall rule does not exist, then it can be added by executing the following commands:

# ufw allow http
# ufw reload

 

The mysqld service is not running or there are crashed database tables

When there is an issue with the MySQL database while trying to login to the web UI we will most probably see an errors message similar to this one:

Message: A database connection error has been detected, we are attempting to repair the server, if the repair does not resolve the issue, please contact Nagios support.

Database corruption is usually caused by power outages, running out of disk space, or improper shutting down the Nagios XI server.

The correct way for shutting down the Nagios XI server would be to issue the following command in the command line:

# shutdown -h now

If the Nagios XI machine has insufficient disk space then we may see errors like this when the repair database script is run:

/usr/local/nagiosxi/scripts/repairmysql.sh: line 59: 11735 Segmentation fault (core dumped) $cmd $t --sort_buffer_size=256M
Timeout error occurred trying to start MySQL Daemon.
Starting mysqld: [FAILED]

We can repair the Nagios and nagiosql databases by running the following commands in the command line as the root user:

# /usr/local/nagiosxi/scripts/repairmysql.sh nagios
# /usr/local/nagiosxi/scripts/repairmysql.sh nagiosql
# /usr/local/nagiosxi/scripts/repairmysql.sh nagiosxi

Alternatively, if from Nagios XI 2014 onwards, we can use:

# cd /usr/local/nagiosxi/scripts/
# ./repair_databases.sh

For an error like the one below:

SQL: DELETE FROM nagios_logentries WHERE logentry_time < FROM_UNIXTIME(1293570334)
SQL: SQL Error [ndoutils] :</b> Table './nagios/nagios_logentries' is marked as crashed and last (automatic?) repair failed CLEANING ndoutils TABLE 'notifications'...

You may need to run a force repair on the tables using the commands below:

RHEL 6|CentOS 6|Oracle Linux 6
# service mysqld stop
# cd /var/lib/mysql/nagios
# myisamchk -r -f nagios_<corrupted_table>
# service mysqld start
# rm -f /usr/local/nagiosxi/var/dbmaint.lock
# php /usr/local/nagiosxi/cron/dbmaint.php

RHEL 7|CentOS 7|Oracle Linux 7|Debian 9
# systemctl stop mariadb.service
# cd /var/lib/mysql/nagios
# myisamchk -r -f nagios_<corrupted_table>
# systemctl start mariadb.service
# rm -f /usr/local/nagiosxi/var/dbmaint.lock
# php /usr/local/nagiosxi/cron/dbmaint.php

Ubuntu 14
# service mysql stop
# cd /var/lib/mysql/nagios
# myisamchk -r -f nagios_<corrupted_table>
# service mysql start
# rm -f /usr/local/nagiosxi/var/dbmaint.lock
# php /usr/local/nagiosxi/cron/dbmaint.php

Debian 8|Ubuntu 16/18
# systemctl stop mysql.service
# cd /var/lib/mysql/nagios
# myisamchk -r -f nagios_<corrupted_table>
# systemctl start mysql.service
# rm -f /usr/local/nagiosxi/var/dbmaint.lock
# php /usr/local/nagiosxi/cron/dbmaint.php

In certain instances, it may be necessary to truncate (empty) one or more tables. The following commands provide examples of how to truncate both the nagios_logentries and nagios_notifications tables in the Nagios MySQL database:

# mysql -u ndoutils -pn@gweb nagios -e 'TRUNCATE TABLE nagios_logentries'
# mysql -u ndoutils -pn@gweb nagios -e 'TRUNCATE TABLE nagios_notifications'

Running these commands will clear all entries from the affected tables. After we truncate tables, we should repeat the repair process outlined above.

 The postgresql service is not running or the database is not accepting commands

For problems with the PostgreSQL, we will see an additional message, which would look something like this:

SQL: SQL Error [nagiosxi] : Database connection failed SQL: SQL Error [nagiosxi] : Database connection failed SQL: SQL Error [nagiosxi] : Database connection failed
Message: A database connection error has been detected, we are attempting to repair the server, if the repair does not resolve the issue, please contact Nagios support.

With this message, we will need to make sure that:

  • We are not running out of disk space with the df commands.
  • PostgreSQL is running and we can actually log in to the database manually

Try to start/restart PostgreSQL to see if it would start normally using one of the commands below:

RHEL 6|CentOS 6|Oracle Linux 6|Ubuntu 14

# service postgresql start
or
# service postgresql restart

RHEL 7|CentOS 7|Oracle Linux 7|Debian|Ubuntu 16/18

# sysctemctl start postgresql.service
or
# sysctemctl restart postgresql.service

Sometimes, we will need to run a vacuum on the Postgres database.

First, determine the version of Postgres with the following command:

# postgres -V

Based on that output, execute the commands specific to the version:

Versions BEFORE 9

RHEL 6|CentOS 6|Oracle Linux 6|Ubuntu 14
# echo "vacuum;vacuum analyze;"|psql nagiosxi postgres
# service postgresql restart

RHEL 7|CentOS 7|Oracle Linux 7|Debian|Ubuntu 16/18
# echo "vacuum;vacuum analyze;"|psql nagiosxi postgres
# sysctemctl restart postgresql.service

Versions 9 onwards

RHEL 6|CentOS 6|Oracle Linux 6|Ubuntu 14
# echo "vacuum;vacuum analyze;vacuum full;"|psql nagiosxi postgres
# service postgresql restart

RHEL 7|CentOS 7|Oracle Linux 7|Debian|Ubuntu 16/18
# echo "vacuum;vacuum analyze;vacuum full;"|psql nagiosxi postgres
# sysctemctl restart postgresql.service

To log in the Postgres manually, run:

# psql nagiosxi nagiosxi

Sometimes, it shows an error message as shown below:

psql: FATAL: database is not accepting commands to avoid wraparound data loss in database "postgres"
HINT: Stop the postmaster and use a standalone backend to vacuum database "postgres".

We may notice either a high CPU usage for the postmaster process or a repeated error message in the /var/lib/pgsql/data/pg_log file:

transaction ID wrap limit is 2147484146

We can try to fix the issue by running the following command in the command line:

Versions BEFORE PostgreSQL 9

RHEL 6|CentOS 6|Oracle Linux 6|Ubuntu 14
# service postgresql stop
# su postgres
# echo "VACUUM;" > /tmp/fix.sql
# postgres -D /var/lib/pgsql/data nagiosxi < /tmp/fix.sql
# postgres -D /var/lib/pgsql/data postgres < /tmp/fix.sql
# postgres -D /var/lib/pgsql/data template1 < /tmp/fix.sql
# exit
# service postgresql start

RHEL 7|CentOS 7|Oracle Linux 7|Debian|Ubuntu 16/18
# sysctemctl stop postgresql.service
# su postgres
# echo "VACUUM;" > /tmp/fix.sql
# postgres -D /var/lib/pgsql/data nagiosxi < /tmp/fix.sql
# postgres -D /var/lib/pgsql/data postgres < /tmp/fix.sql
# postgres -D /var/lib/pgsql/data template1 < /tmp/fix.sql
# exit
# sysctemctl start postgresql.service

Other products installed that use Postgres may need their databases vacuumed

If we have another piece of software installed on the Nagios XI server that uses Postgres, such as Nagios Fusion, we may need to vacuum the databases of that software as well as those mentioned in previous steps.

In the case of Fusion specifically, the following commands needed to be run as well as those in the previous steps:

# postgres -D /var/lib/pgsql/data nagiosfusion < /tmp/fix.sql
or
# postgres --single -D /var/lib/pgsql/data nagiosfusion < /tmp/fix.sql

[Need any further assistance in fixing problems with Nagios XI web interface? – We’re available 24*7]

Conclusion

In short, users often cannot log in to the Nagios XI web interface due to a number of reasons that range from the wrong password to the SELinux policy. Today, we saw how our Support Engineers fix this error.

 

 

 



Source link

Author

Write A Comment