Nginx service that is not properly stopped by the certbot and therefore does not restart

Question

I'm having a little trouble with my vps running with a debian. It hosts several websites, via an infrastructure based on nginx, gunicorn, django. The sites in question have an ssl certificate managed by let's encrypt.

The problem I think comes when let's encrypt wants to renew the certificates.

The error

The system log when the error appears:

Dec 12 00:01:46 vps465872 systemd[1]: Starting Certbot...
Dec 12 00:01:49 vps465872 systemd[1]: Stopping A high performance web server and a reverse proxy server...
Dec 12 00:01:49 vps465872 systemd[1]: Stopped A high performance web server and a reverse proxy server.
Dec 12 00:01:55 vps465872 certbot[600]: nginx: [error] open() "/run/nginx.pid" failed (2: No such file or directory)
Dec 12 00:01:56 vps465872 systemd[1]: Starting A high performance web server and a reverse proxy server...
Dec 12 00:01:56 vps465872 nginx[658]: nginx: [emerg] bind() to 0.0.0.0:80 failed (98: Address already in use)
Dec 12 00:01:56 vps465872 nginx[658]: nginx: [emerg] bind() to 0.0.0.0:443 failed (98: Address already in use)
Dec 12 00:01:57 vps465872 nginx[658]: nginx: [emerg] bind() to 0.0.0.0:80 failed (98: Address already in use)
Dec 12 00:01:57 vps465872 nginx[658]: nginx: [emerg] bind() to 0.0.0.0:443 failed (98: Address already in use)
Dec 12 00:01:57 vps465872 nginx[658]: nginx: [emerg] bind() to 0.0.0.0:80 failed (98: Address already in use)
Dec 12 00:01:57 vps465872 nginx[658]: nginx: [emerg] bind() to 0.0.0.0:443 failed (98: Address already in use)
Dec 12 00:01:58 vps465872 nginx[658]: nginx: [emerg] bind() to 0.0.0.0:80 failed (98: Address already in use)
Dec 12 00:01:58 vps465872 nginx[658]: nginx: [emerg] bind() to 0.0.0.0:443 failed (98: Address already in use)
Dec 12 00:01:58 vps465872 nginx[658]: nginx: [emerg] bind() to 0.0.0.0:80 failed (98: Address already in use)
Dec 12 00:01:58 vps465872 nginx[658]: nginx: [emerg] bind() to 0.0.0.0:443 failed (98: Address already in use)
Dec 12 00:01:59 vps465872 nginx[658]: nginx: [emerg] still could not bind()
Dec 12 00:01:59 vps465872 systemd[1]: nginx.service: Control process exited, code=exited status=1
Dec 12 00:01:59 vps465872 systemd[1]: Failed to start A high performance web server and a reverse proxy server.
Dec 12 00:01:59 vps465872 systemd[1]: nginx.service: Unit entered failed state.
Dec 12 00:01:59 vps465872 systemd[1]: nginx.service: Failed with result 'exit-code'.
Dec 12 00:01:59 vps465872 certbot[600]: Hook command "service nginx start" returned error code 1
Dec 12 00:01:59 vps465872 certbot[600]: Error output from service:
Dec 12 00:01:59 vps465872 certbot[600]: Job for nginx.service failed because the control process exited with error code.
Dec 12 00:01:59 vps465872 certbot[600]: See "systemctl status nginx.service" and "journalctl -xe" for details.

The reproduction

So be it. Let's redo the process manually. I kill everything that lies around nginx:

ps -ef |grep nginx
kill -9 xxxx
kill -9 xxxx

I relaunch nginx:

service nginx start

then everything works fine.

I do a dry-run of the certbot:

certbot renew --dry-run

and now I have the error:

Attempting to renew cert (xxx.fr) from /etc/letsencrypt/renewal/xxx.fr.conf produced an unexpected error: Problem binding to port 443: Could not bind to IPv4 or IPv6... Skipping.

The investigation

I look in the /run directory: the file nginx.pid no longer exists.

On the other hand, a little ps -ef |grep nginx tells me that the process is still running, indeed the websites are working. Therefore if I do a nginx start service, it outputs the address conflict error to me.

I found people on stackoverflow with the same problem as me, but the solutions don't work. But it gave me clues where to look. Certbot renew: nginx: [error] open() "/run/nginx.pid" failed (2: No such file or directory)

so I'm looking: the files /etc/letsencrypt/renewal/xxx.fr.conf contain the following hooks:

[renewalparams]
authenticator = standalone
installer = nginx
pre_hook = service nginx stop
post_hook = service nginx start

Very well. I look at the associated scripts /etc/init.d/nginx : at the very beginning it extracts the pid via

PID=$(cat /etc/nginx/nginx/nginx.conf | grep -Ev' ^\s*#' | awk' BEGIN { RS="[;{}]" } { if ($1 == "pid") print $2 }' | head -n1)

this command works well.

to stop:

stop_nginx() {
    start-stop-daemon --stop --quiet --retry=$STOP_SCHEDULE --pidfile $PID --name $NAME
    RETVAL="$?"
    sleep 1
    return "$RETVAL"
}

to start

start_nginx() {
    start-stop-daemon --start --quiet --pidfile $PID --exec $DAEMON --test > /dev/null \
        || return 1
    start-stop-daemon --start --quiet --pidfile $PID --exec $DAEMON -- \
        $DAEMON_OPTS 2>/dev/null \
        || return 2
}

It looks like it's good. Moreover, when the service works well with its pid, the start and stop commands work very well.

The conclusion

Well, that's all, here I am with a problem I don't understand.

Andrii Rudavko · Accepted Answer

I can recommend to use a webroot mode istead of standalone mode. To renew the certificates it creates a '.well-known/acme-challenge/' in your webservers root directory.

Upside is less down time as instead of 'stop-wait-start' you just need to restart the nginx service via post_hook

Hope this alternative solution helps

Nginx service that is not properly stopped by the certbot and therefore does not restart

Answers (1)

Related Questions