Dean Sherwin
Dean Sherwin

Reputation: 498

Django, Django Dynamic Scraper, Djcelery and Scrapyd - Not Sending Tasks in Production

I'm using Django Dynamic Scraper to build a basic web scraper. I have it 99% of the way finished. It works perfectly in development alongside Celery and Scrapyd. Tasks are sent and fulfilled perfectly.

As for production I'm pretty sure I have things set up correctly: I'm using Supervisor to run Scrapyd and Celery on my VPS. They are both pointing at the correct virtualenv installations etc...

Here's how I know they're both set up fine for the project: When I SSH into my server and use the manage.py shell to execute a celery task, it returns an Async task which is then executed. The results appear in the database and both my scrapyd and celery log show the tasks being processed.

The issue is that my scheduled tasks are not being fired automatically - despite working perfectly find in development.

# django-celery settings

import djcelery
djcelery.setup_loader()
BROKER_URL = 'django://'
CELERYBEAT_SCHEDULER = 'djcelery.schedulers.DatabaseScheduler'

And my Supervisor configs:

Celery Config:

[program:IG_Tracker]
command=/home/dean/Development/IG_Tracker/venv/bin/celery --
app=IG_Tracker.celery:app worker --loglevel=INFO -n worker.%%h
directory=/home/dean/Development/IG_Tracker/
user=root
numprocs=1
stdout_logfile=/home/dean/Development/IG_Tracker/celery-worker.log
stderr_logfile=/home/dean/Development/IG_Tracker/celery-worker.log
autostart=true
autorestart=true
startsecs=10

; Need to wait for currently executing tasks to finish at shutdown.
; Increase this if you have very long running tasks.
stopwaitsecs = 600
killasgroup=true
priority=998

Scrapyd Config:

[program:scrapyd]
directory=/home/dean/Development/IG_Tracker/instagram/ig_scraper
command=/home/dean/Development/IG_Tracker/venv/bin/scrapyd
environment=MY_SETTINGS=/home/dean/Development/IG_Tracker/IG_Trackersettings.py
user=dean
autostart=true
autorestart=true
redirect_stderr=true
numprocs=1
stdout_logfile=/home/dean/Development/IG_Tracker/scrapyd.log
stderr_logfile=/home/dean/Development/IG_Tracker/scrapyd.log
startsecs=10

I have followed the docs as close as I could and used the recommended tools for deployment (eg. scrapyd-deploy etc...). Additionally, when I run celery and scrapyd manually on the server (as one would in development) things work fine. It's just when the two are run using supervisor.

I'm probably missing some setting or another which is preventing my celery tasks stored in the SQLite DB from being picked up and ran automatically by celery/scrapyd when in production.

Upvotes: 1

Views: 223

Answers (1)

Dean Sherwin
Dean Sherwin

Reputation: 498

Okay - so I eventually got this working. Maybe this can help someone else. My issue was that I only had ONE supervisor process for celery where as it needs two - one for actually running the tasks (worker) and another for supervising the scheduling. I only had the worker. This explains why everything worked fine when I fired off a task using the django shell (essentially manually passing a task to the worker).

Here is my conf file for the 'scheduler' celery process:

[program:celery_beat]
command=/home/dean/Development/IG_Tracker/venv/bin/celery beat -A 
IG_Tracker --loglevel=INFO
directory=/home/dean/Development/IG_Tracker/
user=root
numprocs=1
stdout_logfile=/home/dean/Development/IG_Tracker/celery-worker.log
stderr_logfile=/home/dean/Development/IG_Tracker/celery-worker.log
autostart=true
autorestart=true
startsecs=10
stopwaitsecs = 600
killasgroup=true
priority=998

I added that and ran:

supervisorctl reread supervisorctl update supervisotctl restart all

My tasks began running right away.

Upvotes: 1

Related Questions