pnv
pnv

Reputation: 3135

Why watchdog is not kicking?

I'm trying to configure a watchdog on coreos. The service is something like this.

[Unit]
Description=Watchdog example service

[Service]
Type=notify
Environment=NOTIFY_SOCKET=/run/%p.sock
Environment=WATCHDOG_USEC=1000000
ExecStartPre=-/usr/bin/docker kill %p
ExecStartPre=-/usr/bin/docker rm %p
ExecStart=/usr/libexec/sdnotify-proxy /run/%p.sock /usr/bin/docker run \
    --env=NOTIFY_SOCKET=/run/%p.sock \
    -v /run:/run \
    --name %p pranav93/test_watchdogged python hello.py
ExecStop=/usr/bin/docker stop %p

WatchdogSec=1

[Install]
WantedBy=multi-user.target

The python file hello.py is something like,

print 'Hello, in hello.py'
print 'ready sending'
x = sd_notifyd({'READY':1})
print str(x)
print 'watchdog sending'
x = sd_notifyd({'WATCHDOG':1})
print str(x)
print os.environ.get('WATCHDOG_USEC', None)
print 'lol, wait now for sometime'
import time
for i in range(3):
    print i
    time.sleep(1)
print 'finished'

although I'm sending no WATCHDOG=1 pings to sysd, it's still not halted by it and service did not move it 'failed' state. What can be the reason behind it? logs are

Oct 06 09:33:19 core-01 systemd[1]: Starting Watchdog example service...
Oct 06 09:33:19 core-01 docker[2779]: watchdogged
Oct 06 09:33:19 core-01 docker[2790]: watchdogged
Oct 06 09:33:19 core-01 sdnotify-proxy[2800]: True
Oct 06 09:33:22 core-01 sdnotify-proxy[2800]: ready sending
Oct 06 09:33:22 core-01 sdnotify-proxy[2800]: <socket._socketobject object at 0x7fa3cc3c2440>
Oct 06 09:33:22 core-01 sdnotify-proxy[2800]: 1
Oct 06 09:33:22 core-01 sdnotify-proxy[2800]: watchdog sending
Oct 06 09:33:22 core-01 sdnotify-proxy[2800]: <socket._socketobject object at 0x7fa3cc3c2440>
Oct 06 09:33:22 core-01 sdnotify-proxy[2800]: 1
Oct 06 09:33:22 core-01 sdnotify-proxy[2800]: None
Oct 06 09:33:22 core-01 sdnotify-proxy[2800]: lol, wait now for someyime
Oct 06 09:33:22 core-01 sdnotify-proxy[2800]: 0
Oct 06 09:33:22 core-01 sdnotify-proxy[2800]: 1
Oct 06 09:33:22 core-01 sdnotify-proxy[2800]: 2
Oct 06 09:33:22 core-01 sdnotify-proxy[2800]: finished
Oct 06 09:33:22 core-01 docker[2851]: watchdogged
Oct 06 09:33:22 core-01 systemd[1]: Started Watchdog example service.

Upvotes: 0

Views: 978

Answers (1)

Eugene Yakubovich
Eugene Yakubovich

Reputation: 96

I am noticing a couple of things. First, the line Started Watchdog example service. is 3 seconds late, after the program exited, indicating that READY=1 was not received. Watchdog monitoring only kicks in after the unit has been "started".

Also, try to log using print >>os.stderr as output to stdout is buffered and hard to see timing.

You should not have

Environment=NOTIFY_SOCKET=/run/%p.sock
Environment=WATCHDOG_USEC=1000000

as these are set by systemd. You should pass the proxy socket via --env and also the WATCHDOG_USEC as it'll be "lost" otherwise:

ExecStart=/usr/libexec/sdnotify-proxy /run/%p.sock /usr/bin/docker run \
--env=NOTIFY_SOCKET=/run/%p.sock --env=WATCHDOG_USEC=1000000

Upvotes: 2

Related Questions