unficyp
unficyp

Reputation: 15

Monit EXEC not working when monitored process dies

using Monit 5.15 on FreeBSD 10.2:

set daemon  5
set logfile syslog
set pidfile /var/run/monit.pid
set idfile /var/.monit.id
set statefile /var/.monit.state
set alert [email protected]
set mailserver localhost
set httpd port 2812 and
     use address 192.168.40.72
     allow 192.168.20.0/24
     allow admin:monit

check process haproxy with pidfile /var/run/haproxy.pid
     if failed host 192.168.40.72 port 9090 type tcp
       then exec "/bin/sh -c '/bin/echo `/bin/date` >> /tmp/monit.test'"

When i run monit with -vI and i kill haproxy, i have the following output:

Adding net allow '192.168.20.0/24'
Adding credentials for user 'admin'
Runtime constants:
 Control file       = /usr/local/etc/monitrc
 Log file           = syslog
 Pid file           = /var/run/monit.pid
 Id file            = /var/.monit.id
 State file         = /var/.monit.state
 Debug              = True
 Log                = True
 Use syslog         = True
 Is Daemon          = True
 Use process engine = True
 Poll time          = 5 seconds with start delay 0 seconds
 Expect buffer      = 256 bytes
 Mail server(s)     = localhost:25 with timeout 30 seconds
 Mail from          = (not defined)
 Mail subject       = (not defined)
 Mail message       = (not defined)
 Start monit httpd  = True
 httpd bind address = 192.168.40.72
 httpd portnumber   = 2812
 httpd ssl          = Disabled
 httpd signature    = Enabled
 httpd auth. style  = Basic Authentication and Host/Net allow list
 Alert mail to      = root@localhost
   Alert on         = All events

The service list contains the following entries:

Process Name          = haproxy
 Pid file             = /var/run/haproxy.pid
 Monitoring mode      = active
 Existence            = if does not exist then restart
 Port                 = if failed [192.168.40.72]:9090 type TCP/IP protocol DEFAULT with timeout 5 seconds then exec '/bin/sh -c /bin/echo `/bin/date` >> /tmp/monit.test'

System Name           = appsrv01
 Monitoring mode      = active

-------------------------------------------------------------------------------
pidfile '/var/run/monit.pid' does not exist
Starting Monit 5.15 daemon with http interface at [192.168.40.72]:2812
Starting Monit HTTP server at [192.168.40.72]:2812
Monit HTTP server started
'appsrv01' Monit 5.15 started
Sending Monit instance changed notification to root@localhost
'haproxy' process is running with pid 42999
'haproxy' zombie check succeeded
'haproxy' succeeded testing protocol [DEFAULT] at [192.168.40.72]:9090 [TCP/IP]
'haproxy' connection succeeded to [192.168.40.72]:9090 [TCP/IP]
'haproxy' process is running with pid 42999
'haproxy' zombie check succeeded
'haproxy' succeeded testing protocol [DEFAULT] at [192.168.40.72]:9090 [TCP/IP]
'haproxy' connection succeeded to [192.168.40.72]:9090 [TCP/IP]
'haproxy' process is running with pid 42999
'haproxy' zombie check succeeded
'haproxy' succeeded testing protocol [DEFAULT] at [192.168.40.72]:9090 [TCP/IP]
'haproxy' connection succeeded to [192.168.40.72]:9090 [TCP/IP]
'haproxy' process test failed [pid=42999] -- No such process
'haproxy' process is not running
Sending Does not exist notification to root@localhost
'haproxy' trying to restart
'haproxy' stop skipped -- method not defined
'haproxy' start method not defined
'haproxy' monitoring enabled
'haproxy' process test failed [pid=42999] -- No such process
'haproxy' process is not running
'haproxy' trying to restart
'haproxy' stop skipped -- method not defined
'haproxy' start method not defined
'haproxy' monitoring enabled
^CShutting down Monit HTTP server
Monit HTTP server stopped
Monit daemon with pid [48685] stopped
'appsrv01' Monit 5.15 stopped
Sending Monit instance changed notification to root@localhost

The EXEC Line never gets executed, i dont see any new lines in /tmp/monit.test

If i change the checked Port from 9090 to some invalid port, lets say 9190 and start monit (haproxy is running !), i see:

Starting Monit 5.15 daemon with http interface at [192.168.40.72]:2812
Starting Monit HTTP server at [192.168.40.72]:2812
Monit HTTP server started
'appsrv01' Monit 5.15 started
Sending Monit instance changed notification to root@localhost
'haproxy' process is running with pid 50703
'haproxy' zombie check succeeded
Socket test failed for [192.168.40.72]:9190 -- Connection refused
'haproxy' failed protocol test [DEFAULT] at [192.168.40.72]:9190 [TCP/IP] -- Connection refused
Sending Connection failed notification to root@localhost
'haproxy' exec: /bin/sh
'haproxy' process is running with pid 50703
'haproxy' zombie check succeeded
Socket test failed for [192.168.40.72]:9190 -- Connection refused
'haproxy' failed protocol test [DEFAULT] at [192.168.40.72]:9190 [TCP/IP] -- Connection refused
'haproxy' exec: /bin/sh

Why does the EXEC Line works here but not when i kill -9 haproxy ? What i'm trying to do is get monit to run the exec in case of a haproxy failure. the exec line will then contain a command to switch the CARP IP to another host. haproxy itself is monitored using zabbix, so the NOC can investigate the cause of the failure later.

Upvotes: 0

Views: 2555

Answers (1)

Dominic
Dominic

Reputation: 21

When you kill -9 haproxy you're killing the daemon. So when monit performs this "check process" block, it's detecting that the process isn't there and restarting the process. It doesn't perform the check on that port because it sees the process isn't there.

It works when you give it an invalid port because the process is still alive. When it performs the port check it will fail that and run the script.

You should add an additional line to this check block that says

check process haproxy with pidfile /var/run/haproxy.pid
     if failed host 192.168.40.72 port 9090 type tcp 
         then exec "/bin/sh -c '/bin/echo `/bin/date` >> /tmp/monit.test'"
     if restarted then exec "/bin/sh -c '/bin/echo `/bin/date` >>/tmp/monit.test'"

This should run the shell commands on both a restart AND a failed host.

Upvotes: 2

Related Questions