user3595632
user3595632

Reputation: 5730

Celery: why does its distributed tasks run slower than multiprocessing?

My Computer has 16 cpu cores and tried to compare time to finish the task between celery and multiprocessing

Here is experiments (update_daily_price is the method to crawl some stock daily price data of given symbol from the web)

1) Single thread process

code

for s in symbol_list:
    update_daily_price(symbol)

It took total "12mins 54secs"

2) muliprocessing libarary

code

pool = Pool(8)
pool.map(update_daily_price, symbol_list)
pool.close()
pool.join()

It took total "2mins 10secs"

3) celery's apply_async()

I started worker process by celery --workdir=trading/ --concurrency=8 -P eventlet worker

And run task like these:

code

@shared_task
def update_dailyprice_task1(symbol):
    update_daily_price(symbol)

from celery import group
jobs = group(update_dailyprice_task1.s(symbol) for symbol in symbol_list)
jobs.apply_async()

It took total "10mins 24secs"

As you can see here, there is almost not that big difference between 1) and 3). Am I missing something on performing distributing celery tasks?

Upvotes: 0

Views: 1165

Answers (2)

user3595632
user3595632

Reputation: 5730

Solve this by using billiard

Reference : https://github.com/celery/celery/issues/4525

Upvotes: 0

kardaj
kardaj

Reputation: 1935

The problem comes from your celery command:

celery --workdir=trading/ --concurrency=8 -P eventlet worker

According to this page,you're asking celery to create one worker with 8 green threads. Which is different from creating 8 processes. This will effectively create one process that uses 8 threads. Since your function is probably computation heavy, you end up having comparable results to the one process execution.

In order to use multiple processes, you need to use the prefork workers. Using the following command will get you comparable results to the multiprocessing library:

celery --workdir=trading/ --concurrency=8 worker

Upvotes: 1

Related Questions