Reputation: 10249
I have the following set-up:
CELERYD_OPTS="--time-limit=600 -c:low_p 100 -c:high_p 50 -Q:low_p low_priority_queue_name -Q:high_p high_priority_queue_name"
My problem is, sometimes the queue seems to "back up"... that is it will stop consuming tasks. It seems there are to scenarios for this:
celery inspect active
will show that not all workers are used up - that is, I will only see a few active tasksstrace
on the worker processes returns nothing... completely zero activity from the workerI would appreciate any information or pointers on:
strace
to see what the worker processes are doing, but so far that has been useful in telling me that the worker is hangingflower
and events
but they are both excellent in real-time - but don't have any automated monitoring/alarming functionality). Am I just better off writing my own monitoring tools with supervisord?Also, I am starting my tasks from django-celery
Upvotes: 13
Views: 8050
Reputation: 5757
@goro,if you are making requests to foreign services, you should try gevent or eventlet pool implementation instead of spawning 100500 workers. I also had problem, when celery workers stops consuming tasks, it was caused by a bug with celery+gevent+sentry(raven) combination.
One thing I figure out about Celery, is that it could work fine without any monitoring if all done right(currently I'm doing >50M tasks per day), but if it's not, monitoring will not help you very much. "Disaster recovery" in Celery is a bit tricky, not all things will work as you expect :(
You should break you solution on smaller peaces, may be separate some tasks between different queues. At some point, you'll find code snippet which cause problems.
Upvotes: 3
Reputation: 12310
A very basic queue watchdog can be implemented with just a single script that’s run every minute by cron. First, it fires off a task that, when executed (in a worker), touches a predefined file, for example:
with open('/var/run/celery-heartbeat', 'w'):
pass
Then the script checks the modification timestamp on that file and, if it’s more than a minute (or 2 minutes, or whatever) away, sends an alarm and/or restarts the workers and/or the broker.
It gets a bit trickier if you have multiple machines, but the same idea applies.
Upvotes: 4
Reputation: 31
I would think this is because of workers prefetching tasks. If this is still a problem you can update celery to 3.1 and use -Ofair
worker option. The config option that I tried using before -Ofair
was CELERYD_PREFETCH_MULTIPLIER
. However, setting CELERYD_PREFETCH_MULTIPLIER = 1
(its lowest value) does not help since workers will still prefetch one task in advance.
See http://docs.celeryproject.org/en/latest/whatsnew-3.1.html#prefork-pool-improvements and especially http://docs.celeryproject.org/en/latest/whatsnew-3.1.html#caveats.
Upvotes: 3