Reputation: 7671
I am running a series of long-running heavy-weight Celery tasks (which spawn multiple subprocesses) in a queue with CELERYD_CONCURRENCY = 4
. Initially, 4 tasks are started as they should. However, as tasks finish no new tasks are started until more finish and soon Celery keeps the amount of active tasks down to 1 or 2 until all tasks are complete (confirmed by Celery Flower).
When I only run simple tasks such as the default Celery add
function everything works as expected.
Does the subprocesses started by Celery tasks (with same process group ID as the task) count to fill up the concurrency slots? Is there any way to make sure Celery only counts the tasks themselves?
Upvotes: 2
Views: 2460
Reputation: 2881
Celery uses prefork as the default execution pool, and every time you spawn a subprocess (another fork), it counts up to the number of concurrent processes running, i.e. the number in CELERYD_CONCURRENCY
.
The way to avoid this are by using eventlet, which will allow you to spawn multiple async calls on each task, as long as your tasks don't have any calls that block, like the subprocess.communicate
.
To further optimize, you can try splitting the tasks that use subprocess.communicate into a different queue that has a worker using prefork and everything else that is doesn't block in a worker with eventlet.
Upvotes: 2