Reputation: 7765
I have a CPU intensive Celery task and within one of the task it can be further parallelized using joblib. By default, starting a worker with celery, will create a pool with number of max concurrency equal to number of CPUs/cores (which is 36 in my case).
My question is, using this configuration, does it mean that each worker process will have only 1 core to use and will not benefit from the parallelization of joblib? or will it use all the cores when there is no other task in the queue of the worker.
For example:
@app.task # picked by celery worker process
def a_task():
algo = Algo(n_jobs=5) # further parallelization in the task
....
Upvotes: 4
Views: 1639
Reputation: 19822
No, it does not. - Celery can't restrict worker-process to use a single core. It is up to the operating system how it spreads the load of those 36 worker processes but yes you can say that each will have a core to run on. Just to remind you, the worker processes barely take CPU in your case. Most of the CPU time will be used by joblib.
Tasks executed by the Celery worker processes use joblib.Parallel and no matter what backend you pick (multiprocessing or threading) you end up overutilising. (Using joblib.Parallel with n_jobs=1 makes no sense in this context I think)
This means that under heavy load, each core on your machine will run 1 Celery worker process, and many (depends on j_jobs value) joblib.Parallel processes or threads (depending on the backend setting).
Upvotes: 1