Reputation: 41
I have created several hostgroups in nagios and each hostgroup consists of several hosts. Hosts carry applications which are monitored by service checks, always at least 7 or more per application. Thus my problem is: when lets say application on host A in hostgroup "Testing" is going down I suddenly receive about 7 notifications describing that every of 7 controls of application on host A in hostgroup "Testing" is in critical state.
What I would like nagios to configure is to send one notification that service check of, for instance, message count is in critical state on host A and then maximum of one more notification that hostgroup "Testing" is down. This way I do not get notified 7times and I know that I have to fix problems on specific host in specific hostgroup. This way it is more clear what problem to solve.
To add another example: When application on host A goes down and I receive lets say 10 notifications, few seconds later http_checks will start to notify me as well because apache does not recevie data from application which is down. So I end up solving one problem and receive about 20 or more notification. What I would appreciate is maximum of 4 notifications. One from one service check on host A and one per hostgroup in which host A is and then the same per hostgroup where is apache. Or if they are in the same hostgroup there would be just 2 notifications at all.
If similar problem occures in another hostgroup at the same time I would again know that there are two hostgroups with problem to fix. Current situation however is that I receive about 50 notifications and get confused where to start and what the real problem is.
Is anyone of you facing similar problem? I was looking quite a long time to any similar topic to solve the problem. I tryed to use dependencies but did not find way to configure nagios to the situation I described above. Parent - child relationship can be used only per hosts. Escalation does not solve this problem at all.
Maybe I just missed some information in documentation regarding this configuration. Would appreciate any advice.
Upvotes: 4
Views: 6491
Reputation: 1482
Nagios can do this with Service Dependency checks. See: http://nagios.sourceforge.net/docs/nagioscore/3/en/dependencies.html
But it's a real pain to set up and keep managed. I found the simplest solution was to use the fact that NRPE commands (defined on the target host), can execute a nearly unlimited number of actual checks. All via a single Nagios service check. I 'bundle' all the checks for a single application (process up/down, various log scraps, log ages, etc.) so that each individual applications has only a single check. The check results tell you which check has failed.
Upvotes: 2