Goro
Goro

Reputation: 10249

How to monitor queue health in celery

I have the following set-up:

My problem is, sometimes the queue seems to "back up"... that is it will stop consuming tasks. It seems there are to scenarios for this:

I would appreciate any information or pointers on:

Also, I am starting my tasks from django-celery

Upvotes: 13

Views: 8050

Answers (3)

Artem Mezhenin
Artem Mezhenin

Reputation: 5757

@goro,if you are making requests to foreign services, you should try gevent or eventlet pool implementation instead of spawning 100500 workers. I also had problem, when celery workers stops consuming tasks, it was caused by a bug with celery+gevent+sentry(raven) combination.

One thing I figure out about Celery, is that it could work fine without any monitoring if all done right(currently I'm doing >50M tasks per day), but if it's not, monitoring will not help you very much. "Disaster recovery" in Celery is a bit tricky, not all things will work as you expect :(

You should break you solution on smaller peaces, may be separate some tasks between different queues. At some point, you'll find code snippet which cause problems.

Upvotes: 3

Vasiliy Faronov
Vasiliy Faronov

Reputation: 12310

A very basic queue watchdog can be implemented with just a single script that’s run every minute by cron. First, it fires off a task that, when executed (in a worker), touches a predefined file, for example:

with open('/var/run/celery-heartbeat', 'w'):
    pass

Then the script checks the modification timestamp on that file and, if it’s more than a minute (or 2 minutes, or whatever) away, sends an alarm and/or restarts the workers and/or the broker.

It gets a bit trickier if you have multiple machines, but the same idea applies.

Upvotes: 4

holmars
holmars

Reputation: 31

I would think this is because of workers prefetching tasks. If this is still a problem you can update celery to 3.1 and use -Ofair worker option. The config option that I tried using before -Ofair was CELERYD_PREFETCH_MULTIPLIER. However, setting CELERYD_PREFETCH_MULTIPLIER = 1 (its lowest value) does not help since workers will still prefetch one task in advance.

See http://docs.celeryproject.org/en/latest/whatsnew-3.1.html#prefork-pool-improvements and especially http://docs.celeryproject.org/en/latest/whatsnew-3.1.html#caveats.

Upvotes: 3

Related Questions