Wintermute
Wintermute

Reputation: 3045

certbot nginx authentication failure: "Connection reset by peer"

I'm trying to renew an expired certbot SSL for Nginx on Ubuntu 18. I'm getting... well, various weirdness, but the certbot error is:

Certbot failed to authenticate some domains (authenticator: nginx). The Certificate Authority reported these problems: Domain: mysite.co.uk Type: connection Detail: ...: Fetching http://mysite.co.uk/.well-known/acme-challenge/rx6m9QMdK0h16ZOJYsq5sx_AZbxI4zWGvJ6o_kt3b-A: Connection reset by peer

I've got the site running on HTTP:

server {
    listen 80;
    listen [::]:80;
    server_name www.mysite.co.uk mysite.co.uk;

    root /var/www/html;
}

...the nginx.conf is telling it to keep its PID in /run/nginx.pid, I can start and stop it via service nginx start|stop and everything's good:

curl -I http://www.mysite.co.uk/
HTTP/1.1 200 OK

I'm not clear how this /.well-known/acme-challenge/ thing is supposed to be working - there's certainly no such folder in /var/www/html, but I did read that certbot starts it's own server (??) to manage authentication and it's wise to stop your own while renewing.

So, as root, I do:

cat /run/nginx.pid
> 124876
service nginx stop
lsof -i -P -n | grep LISTEN
> nothing on 80 or 443
cat /run/nginx.pid
> file doesn't exist
certbot certonly --nginx

I know there's a certbot renew command but I'm getting the same results with each, so... anyway. It correctly picks up the domain name from the existing conf, prompts me to renew, and eventually spits out the error above. I also see a couple lines added to nginx error.log:

[notice] 125028#125028: signal process started
[error] 125028#125028: invalid PID number "" in "/run/nginx.pid"

Sure enough, nginx is started and is listening on 80 and 443. I didn't start it. It's also got a new PID. If I try service nginx restart, it fails because it's trying to bind to ports that this other (certbot's ??) Nginx process is already using.

At all times, whether via "proper" nginx or this certbot zombie one, my site is happily returning HTTP 200's to external requests. I've never got a "Connection reset by peer" error myself. Even when I manually created a /var/www/html/.well-known/acme-challenge/test file it's always served it fine.

So.. what in the almighty shenannigans is going on? Why is certbot starting an nginx instance it can't see? Why doesn't it stop it? Is it supposed to be creating something in /.well-known/acme-challenge/? Is my nginx instance somehow interfering? What should be happening? What am I doing wrong??

Upvotes: 0

Views: 1550

Answers (1)

Wintermute
Wintermute

Reputation: 3045

Ok, I still don't understand the weirdness with certbot starting its own nginx and not stopping it and mucking up PIDs and all that... but certbot can now see our server and renew the SSL certs. And after two days of IT swearing blind that it wasn't being blocked by a firewall rule... it was the firewall.

Sigh.

Upvotes: 2

Related Questions