Reputation: 51
My normal method of testing the notification and escalation chain is to simulate a failure by causing one, for example blocking a port.
But this is thoroughly unsatisfying. I don't want down time recorded in nagios where there was none. I also don't want to wait.
Does anyone know a way to test a notification chain without causing the outage? For example something like this:
$ ./check_notifications_chain <service|host> <time down>
at <x> minutes notification email sent to group <people>
at <2x> minutes notification email sent to group <people>
at <3x> minutes escalated to group <management>
at <200x> rm -rf; shutdown -h now executed.
Extending this paradigm I might make the notification chain a nagios check in itself, but I'll stop here before my brain explodes.
Anyone?
Upvotes: 5
Views: 15764
Reputation: 31
This is an old post but maybe my solution can help someone.
I use the plugin "check_dummy" which is in the Nagios plugins pack. As it says, it is stupid.
See some exemple of how it works :
Usage:
check_dummy <integer state> [optional text]
$ ./check_dummy 0
OK
$ ./check_dummy 2
CRITICAL
$ ./check_dummy 3 salut
UNKNOWN: salut
$ ./check_dummy 1 azerty
WARNING: azerty
$ echo $?
1
I create a file which contain the interger state and the optional text : echo 0 OKAY | sudo tee /usr/local/nagios/libexec/dummy.txt sudo chown nagios:nagios /usr/local/nagios/libexec/dummy.txt
With the command :
# Dummy check (notifications tests)
define command {
command_name my_check_dummy
command_line $USER1$/check_dummy $(cat /usr/local/nagios/libexec/dummy.txt)
}
Associated with the service description :
define service {
use generic-service
host_name localhost
service_description Dummy check
check_period 24x7
check_interval 1
max_check_attempts 1
retry_interval 1
notifications_enabled 1
notification_options w,u,c,r
notification_interval 0
notification_period 24x7
check_command my_check_dummy
}
So I just change the contents of the file "dummy.txt" to change the service state :
echo "2 Oups" | sudo tee /usr/local/nagios/libexec/dummy.txt
echo "1 AHHHH" | sudo tee /usr/local/nagios/libexec/dummy.txt
echo "0 Parfait !" | sudo tee /usr/local/nagios/libexec/dummy.txt
This allowed me to debug my notification program.
Hope it helps !
Upvotes: 3
Reputation: 581
If you only want to verify that the email alerts are working properly, you could create a simple test service, which generates a warning once a day.
test_alert.sh:
#!/bin/bash
date=`date -u +%H%M`
echo $date
echo "Nagios test script. Intentionally generates a warning daily."
if [[ "$date" -ge "1900" && "$date" -le "1920" ]] ; then
exit 1
else
exit 0
fi
commands.cfg:
define command{
command_name test_alert
command_line /bin/bash /usr/local/scripts/test_alert.sh
}
services.cfg:
define service {
host localhost
service_description Test Alert
check_command test_alert
use generic-service
}
Upvotes: 6