Reputation: 11
I am working on a distributed processing application that I need to be as fast as possible. I have a deployment where a leader application writes celery tasks to redis and have N listening workers to execute the task.
I have benchmarked my process to where 1 worker processing 1 task is exactly as fast as I need. I then need to x100 for the actual distributed processing use-case. As such, I spin up 110 workers to listen for 100 tasks on the queue. I was expecting to have 100 workers process the 100 tasks entirely in parallel, but in practice, I am seeing some workers pick up multiple tasks from the queue while other workers dont process anything. This defeats the purpose of my benchmarking as I cant get my application worker fast enough to process multiple tasks -- I need this to be done entirely in parallel.
I am executing my task in a group and ultimately the configuration is effectively all default aside from an override I found with the prefetch multiplier and ack_late setting (seen below):
app.conf.update(
worker_prefetch_multiplier=1,
task_acks_late=True
)
Is there something else I can set so all the parallel workers take 1 task simultaneously and my total processing time of 100 tasks is approximately my benchmarked time of 1 task (basically within variance)?
Thanks for any insight/help!
I have tried a large amount of celery configuration settings but am struggling to find the perfect combination
Upvotes: 1
Views: 38
Reputation: 805
Have you tried running all your workers with solo pool
celery --app=worker.app worker --pool=solo
This ensure that a worker will be processing one task at a time. Also please provide more details to help you better
Upvotes: 0