Reputation: 31
I am parallizing code across 30 CPUs and confirmed that outside a container this works fine using the python library 'pathos'.
pool = ProcessPool(nodes=30)
results = pool.map(func_that_needs_to_run_in_parallel, range(30))
pool.close()
pool.join()
results_df = pd.concat(results)
However, it doesn't work while running the code as part of a Flask app in a Docker container. I have three containers:
The code for the worker process can be summarised as:
#some code that needs to be run on only one cpu
#the above 'ProcessPool' code snippet for one particularly resource-intensive task
#some code that needs to be run on only one cpu
When I run the app, the parallelization part of the code in the worker container never uses more than 4 cpus. I confirmed this in docker stats and htop
. There are no cpu usage limits on the containers in the docker-compose yaml file.
htop
shows that the code runs on only 4 cpus at any one time but it actually randomly switches which cpus its using during the task, thus the worker container can access all 48 cpus.
Summary: Running the app with this multiprocessing code is helpful but there is a ceiling on CPU usage.
Upvotes: 1
Views: 813
Reputation: 636
Early docker literature (2016) suggested one container per CPU, which is clearly not the case. The idea is to configure this at run time, in the same way you assign memory,
docker run -it --cpus="30" debian /bin/bash
The docker container resource allocation I found useful, here
If pathos
is the issue why not switch to themultiprocessor.Pool()
library via apply
, map_async
, or imap
methods?
Upvotes: 0