Reputation: 1791
We have written an application which sends out billing information by email to over 200,000 customers of our customer.
Presently we were using a batch processing program that takes well over 2 days to send out all emails in a sequential manner.
We have shifted the entire program to Celery, and we have already seen some remarkable improvement on a regular 2 worker load.
Has anyone benchmarked Celery?
Documentation says that number fo workers must equal to the number of CPUs for performance. Suppose we virtualize the server and set up 32 vCPUs over a physical 8 core server, can we run it at 32 threads in concurrency?
The mails are shipped through different mail servers, the server only runs Rabbit MQ, Celery and the application.
Please advise the right number of workers and threads and vcpus to avoid unnecessary queueing, and delaying.
Thank you!
Upvotes: 3
Views: 2951
Reputation: 30472
Short answer: You will need to understand what you are doing, and probably measure it yourself
Longer:
The main question is if your tasks are CPU bound or I/O (network/disk) bound. If your tasks are CPU bound (probably stuff like generating templates, images), you won't get any improvements by adding workers. However most chances are you are on I/O bound (network) tasks, and if you are waiting for network acknowledgments, and there is no bottleneck in the mail server etc, you will probably be able to gain higher results by using more workers.
To understand this much better, I highly recommend walking slowly through David Beazley's eye opening presentation here: An Introduction to Python Concurrency. This does not cover Celery and Tornado, but gives an excellent overview of the underlying technology and problems, and lays out the solutions (with examples) as well.
Upvotes: 4