Reputation: 433
We have a requirement to monitor and try to restart our gUnicorn/Django app if it goes down. We're using gunicorn 20.0.4.
I have the following nrs.service running fine with systemd. I'm trying to figure out if it's possible to integrate systemd's watchdog capabilities with gUnicorn. Looking through the source I don't see anywhere a sd_notify("WATCHDOG=1") is being called so I'm thinking that no, gunicorn doesn't know how to keep systemd aware that it's up (it calls sd_notify("READY=1...") at startup but in its run loop there's no signal being sent saying it's still running)
Here's the nrs.service file. I have commented out the watchdog vars because it obviously sends my service into a failed state shortly after it starts.
[Unit]
Description=Gunicorn instance to serve NRS project
After=network.target
[Service]
WorkingDirectory=/etc/nrs
Environment="PATH=/etc/nrs/bin"
ExecStart=/etc/nrs/bin/gunicorn --error-logfile /etc/nrs/logs/gunicorn_error.log --certfile=/etc/httpd/https_certificate/nrs.cer --keyfile=/etc/httpd/https_certificate/server.key --access-logfile /etc/nrs/logs/gunicorn_access.log --capture-output --bind=nrshost:8800 anomalyalerts.wsgi
#WatchdogSec=15s
#Restart=on-failure
#StartLimitInterval=1min
#StartLimitBurst=4
[Install]
WantedBy=multi-user.target
So systemd watchdog is doing its thing, just looks like out of the box gunicorn doesn't support it. Not very familiar with 'monkey-patching' but I'm thinking if we want to use this method of monitoring, I'm going to have to implement some custom code? Other thought was just to have a watch command check the service and try to restart it, which might be easier.
Thanks Jason
Upvotes: 2
Views: 773
Reputation: 5420
monitor and try to restart our gUnicorn/Django app if it goes down
systemd's watchdog will not help in the described case. The reason is that the the watchdog is intended to monitor the main service process, which does not run your app directly.
The Gunicorn's master process, which is the main service process from the systemd's perspective, is a loop that manages the worker processes. Your app is running inside the worker process, so if anything happens there, the worker process is the one that should be restarted, not the master process.
Worker processes' restart is handled by Gunicorn automatically (see timeout
setting). As for the main service process, in a rare case when it dies, the Restart=on-failure
option can restart it even without a watchdog (see the docs for details on how it behaves).
Upvotes: 1