Reputation: 5730
My Computer has 16 cpu cores and tried to compare time to finish the task between celery
and multiprocessing
Here is experiments
(update_daily_price
is the method to crawl some stock daily price data of given symbol from the web)
1) Single thread process
code
for s in symbol_list:
update_daily_price(symbol)
It took total "12mins 54secs"
2) muliprocessing
libarary
code
pool = Pool(8)
pool.map(update_daily_price, symbol_list)
pool.close()
pool.join()
It took total "2mins 10secs"
3) celery
's apply_async()
I started worker process by celery --workdir=trading/ --concurrency=8 -P eventlet worker
And run task like these:
code
@shared_task
def update_dailyprice_task1(symbol):
update_daily_price(symbol)
from celery import group
jobs = group(update_dailyprice_task1.s(symbol) for symbol in symbol_list)
jobs.apply_async()
It took total "10mins 24secs"
As you can see here, there is almost not that big difference between 1)
and 3)
. Am I missing something on performing distributing celery tasks?
Upvotes: 0
Views: 1165
Reputation: 5730
Solve this by using billiard
Reference : https://github.com/celery/celery/issues/4525
Upvotes: 0
Reputation: 1935
The problem comes from your celery command:
celery --workdir=trading/ --concurrency=8 -P eventlet worker
According to this page,you're asking celery to create one worker with 8 green threads. Which is different from creating 8 processes. This will effectively create one process that uses 8 threads. Since your function is probably computation heavy, you end up having comparable results to the one process execution.
In order to use multiple processes, you need to use the prefork workers. Using the following command will get you comparable results to the multiprocessing
library:
celery --workdir=trading/ --concurrency=8 worker
Upvotes: 1