Juicy
Juicy

Reputation: 12520

Remove a revoked Celery task

According to the docs on task.revoke():

All worker nodes keeps a memory of revoked task ids, either in-memory or persistent on disk

And

Revoking tasks works by sending a broadcast message to all the workers, the workers then keep a list of revoked tasks in memory. When a worker starts up it will synchronize revoked tasks with other workers in the cluster.

This sounds like tasks are still around after you've revoked them. I don't understand why there's not a clear way to revoke the task and remove it from the queue.

The docs seem to imply you need to keep a list of revoked tasks indefinitely to ensure new workers don't pick them up in some circumstances.

I'm also aware that there's a function to completely purge the task queue, but that's not what I'm looking for.

Is there a way to revoke a task and purge it (and only it) from the task queue in Celery?

Upvotes: 5

Views: 2757

Answers (2)

DejanLekic
DejanLekic

Reputation: 19787

There is a tiny, nice section in the Celery documentation called "revoke: Revoking tasks" - please read it.

In short - default behaviour is to gracefully stop the task. Furthermore, the task may just be waiting in the queue, in which case revoke just removes it from the queue (simplest case). More complicated is when the task is already running... With terminate=True you tell Celery worker to send SIGINT to the worker process executing the task. But in some cases that may not work. - Just like you have "zombie processes" in Linux, you may have "zombie tasks" that are difficult to revoke (I know - it is not the best analogy, but you will get the point), in which case you revoke them with SIGKILL (by revoking with terminate=True, signal='SIGKILL'). Once revoke succeeds, you will not see the task in the queue.

Upvotes: 1

Eric M.
Eric M.

Reputation: 2967

It is not possible to remove only one message in the queue other than removing them all with a purge or with a manual command in your broker.

However, you might not mind as a revoked task once processed by a worker is removed from the queue. So you don't have to maintain an eternal revoked id list.

You should keep an id on this list only while it has not been processed by a worker because the workers are busy or the task is scheduled for later.

The list should be persistent if all your workers could be stopped at the same time and you want to keep the flagged revoked tasks. Else, a new worker asks the already running workers about the tasks to flag as revoked.

Note: I analyzed a case with Redis as the broker and backend to get the answer. The task revoked was finally removed from the queue and visible as a result (marked as revoked).

Example:

  1. The task with id 'A' is pushed in the queue and scheduled for in 1 hour
  2. The task 'A' is revoke() so a message is sent to all workers to flag the task as revoked. The id is in the revoke list of each worker (cf in log Tasks flagged as revoked: A)
  3. The task 'A' is still in the queue waiting for its ETA
  4. After one hour, a worker executes the task. As the task is flagged as revoked, the worker does not execute the task but immediately writes the task result in the backend. The result says that the task is revoked (so not executed).

I don't know about the exact reason why you can't directly remove tasks from the queue. But my intuitions are:

  • All the brokers might not allow removing an element in the middle of the queue
  • Removing a task immediately and letting the task system consistent is maybe harder. And as the Celery team has a limited workforce, they don't want to support something complex if a simpler solution does the job

Upvotes: 2

Related Questions