Reputation: 1240
I have a service check that I've found on the Nagios Exchange site which works well for small directories, but not well for larger ones that take longer than 30 or 60 seconds to complete.
The problem I'm having is that I need to configure a service check that Nagios can run once a day but will remain open for 1440 minutes (one day). The directory listing is huge and takes many hours to complete (up to 20 hours).
This is my service check (check every day, when using nrpe, the timeout is 86400 seconds which is also one day). But for some reason, even though I can see the du -sk running on the command line in ps -ef | grep du, Nagios is reporting "(Service Check Timed Out)":
define service {
use generic-service,srv-pnp
host_name IMAGEServer1
service_description Images
check_command check_nrpe!check_dirsize -t 86400
check_interval 1440
}
In my nrpe.cfg file on the linux server i have these two directives as well:
command_timeout=86400
connection_timeout=86400
How can I get Nagios to complete the check and not time out? I was under the impression that my directives above were correct.
Upvotes: 0
Views: 2962
Reputation: 24473
What's timing out is the check_nrpe
command on the local side (it has a default timeout of 2 minutes). You could edit its command definition to use a long timeout.
Alternatively, you might want to do this as a passive check on IMAGEServer1, running as a cron job.
Upvotes: 1