parallelizing code with python pathos multiprocessing in docker

Question

I am parallizing code across 30 CPUs and confirmed that outside a container this works fine using the python library 'pathos'.

pool = ProcessPool(nodes=30)
results = pool.map(func_that_needs_to_run_in_parallel, range(30))
pool.close()
pool.join()

results_df = pd.concat(results)

However, it doesn't work while running the code as part of a Flask app in a Docker container. I have three containers:

flask app,
redis which I use to offload all the heavy processing to a worker process,
the worker process.

The code for the worker process can be summarised as:

#some code that needs to be run on only one cpu
#the above 'ProcessPool' code snippet for one particularly resource-intensive task
#some code that needs to be run on only one cpu

When I run the app, the parallelization part of the code in the worker container never uses more than 4 cpus. I confirmed this in docker stats and htop. There are no cpu usage limits on the containers in the docker-compose yaml file.

htop shows that the code runs on only 4 cpus at any one time but it actually randomly switches which cpus its using during the task, thus the worker container can access all 48 cpus.

Summary: Running the app with this multiprocessing code is helpful but there is a ceiling on CPU usage.

parallelizing code with python pathos multiprocessing in docker

Answers (1)

Related Questions