merlin
merlin

Reputation: 2917

Why does Scrapyd time out due to refused connection?

I am operating several cloud instances where scrapyd is scheduling scrapy crawlers that write to a remote db server (MySQL 8.x on Ubuntu 20.04). This worked for months. Suddenly it was not possible to deploy with scrapyd-deploy to one of the servers. it timed out:

Error logfile of nginx:

2021/12/16 17:33:16 [error] 1221#1221: *1433 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 95.117.76.85, server: , request: "POST /addversion.json HTTP/1.1", upstream: "http://127.0.0.1:6800/addversion.json", host: "myip:6843"

I first thought that there is a new bug introduced as the last (untouched) machine was able to schedule new crawlers, but now after reboot this also times out:

/usr/bin/curl --silent http://localhost:6800/schedule.json -d project=myproject -d spider=mycrawler

Even listjobs.json times out. Only a reboot or restart of scrapyd brings back the basic functions like listjobs, but it scheduling does not work.

The db connection seems OK, I can login with the user in question and its password from that machine. This morning several spiders have been running, while this was the case I could not start new ones on the same machine.

I am running out of ideas on how to debug this. Any help is appreciated.

Upvotes: 1

Views: 152

Answers (0)

Related Questions