For a Python/Django/Celery based deployment tool, we have the following setup: We currently use the default Celery setup. (One queue+exchange called "celery".) Each Task on the queue represents a deployment operation. Each task for an environment ends with a synchronisation phase that potentially takes (very) long. The following specs need to be fulfilled: Concurrency : tasks for multiple environments should be carried out concurrently. Locking : There may be at most one task running for each environment at the same time (i.e. environments lock). Throughput optimization : When there are multiple tasks for a single environment, their sync phases may be combined for optimization. So if a task comes near its ending, it should check if there are new tasks waiting in the queue for this environment and, if so, skip its sync phase. What is the preferred way to implement this? Some thoughts: I would say we have to set up multiple queues: one for each environment, and have N celery workers processing a single queue exclusively, each. (This would solve spec 1+2.) But how do we get multiple celery workers to listen to different queues exclusively? Is there a clean way of knowing there are more tasks waiting in the queue for an environment?

pythondjangocelery

nvie

Reputation: 376

How to layout a queue/worker structure to support large tasks for multiple environments?

For a Python/Django/Celery based deployment tool, we have the following setup:

We currently use the default Celery setup. (One queue+exchange called "celery".)
Each Task on the queue represents a deployment operation.
Each task for an environment ends with a synchronisation phase that potentially takes (very) long.

The following specs need to be fulfilled:

Concurrency: tasks for multiple environments should be carried out concurrently.
Locking: There may be at most one task running for each environment at the same time (i.e. environments lock).
Throughput optimization: When there are multiple tasks for a single environment, their sync phases may be combined for optimization. So if a task comes near its ending, it should check if there are new tasks waiting in the queue for this environment and, if so, skip its sync phase.

What is the preferred way to implement this?

Some thoughts:

I would say we have to set up multiple queues: one for each environment, and have N celery workers processing a single queue exclusively, each. (This would solve spec 1+2.)
But how do we get multiple celery workers to listen to different queues exclusively?
Is there a clean way of knowing there are more tasks waiting in the queue for an environment?

Upvotes: 12

Answers (2)

Antonio Beamud

Reputation: 2341

for 1,2 use multiple queues and launch workers with -Q to specify what queue to listen. Also configure CELERYD_PREFETCH_MULTIPLIER = 1, for only one task at a time.

To get the queue lenght (tested with rabbitmq), you can use something like this:

from kombu.connection import BrokerConnection
connection = BrokerConnection(BROKER_HOST, BROKER_USER...)
channel = connection.channel()
q, j, c = channel.queue_declare('celery', passive=True)
print 'celery %d jobs in queue' % j

'queue_delcare' as a side effect, give you the queue's length. Hope this can help you.

Upvotes: 2

babsher

Reputation: 1016

I would take a look at zeromq it can do messaging and multi-threading in one super fast library. It also supports a large number of languages and has built in load balancing.

Upvotes: 1

How to layout a queue/worker structure to support large tasks for multiple environments?

Answers (2)

Related Questions