hoola_huupsh
hoola_huupsh

Reputation: 31

parallelizing code with python pathos multiprocessing in docker

I am parallizing code across 30 CPUs and confirmed that outside a container this works fine using the python library 'pathos'.

pool = ProcessPool(nodes=30)
results = pool.map(func_that_needs_to_run_in_parallel, range(30))
pool.close()
pool.join()

results_df = pd.concat(results)

However, it doesn't work while running the code as part of a Flask app in a Docker container. I have three containers:

The code for the worker process can be summarised as:

#some code that needs to be run on only one cpu
#the above 'ProcessPool' code snippet for one particularly resource-intensive task
#some code that needs to be run on only one cpu

When I run the app, the parallelization part of the code in the worker container never uses more than 4 cpus. I confirmed this in docker stats and htop. There are no cpu usage limits on the containers in the docker-compose yaml file.

htop shows that the code runs on only 4 cpus at any one time but it actually randomly switches which cpus its using during the task, thus the worker container can access all 48 cpus.

Summary: Running the app with this multiprocessing code is helpful but there is a ceiling on CPU usage.

Upvotes: 1

Views: 813

Answers (1)

M__
M__

Reputation: 636

Early docker literature (2016) suggested one container per CPU, which is clearly not the case. The idea is to configure this at run time, in the same way you assign memory,

docker run -it --cpus="30" debian /bin/bash

The docker container resource allocation I found useful, here


If pathos is the issue why not switch to themultiprocessor.Pool() library via apply, map_async, or imap methods?

Upvotes: 0

Related Questions