Reputation: 1635
I am trying to test multiprocessing for python inside a docker container but even, if the processes are created successfully (I have 8 CPUs and 8 processes are created), they always take only one physical CPU. Here is my code:
from sklearn.externals.joblib.parallel import Parallel, delayed
import multiprocessing
import pandas
import numpy
from scipy.stats import linregress
import random
import logging
def applyParallel(dfGrouped, func):
retLst = Parallel(n_jobs=multiprocessing.cpu_count())(delayed(func)(group) for name, group in dfGrouped)
return pandas.concat(retLst)
def compute_regression(df):
result = {}
(slope,intercept,rvalue,pvalue,stderr) = linregress(df.date,df.value)
result["slope"] = [slope]
result["intercept"] = [intercept]
return pandas.DataFrame(result)
if __name__ == '__main__':
logging.basicConfig(level=logging.DEBUG,
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')
logging.info("start")
random_list = []
for i in range(1,10000):
for j in range(1,100):
random_list.append({"id":i,"date":j,"value":random.random()})
df = pandas.DataFrame(random_list)
df = applyParallel(df.groupby('id'), compute_regression)
logging.info("end")
I tried multiple docker options when I launch like --cpus or --cpuset but it is always using only 1 physical CPUs. Is it an issue in Docker, python, the OS? Docker version is 1.13.1
The result of the cpu_count()
:
>>> import multiprocessing
>>> multiprocessing.cpu_count()
8
During the run, here is a top. We can see the main process and the 8 child processes but I find the percentages weird.
And then, if I change to 4 processes, total amount of CPU used is always the same:
Upvotes: 33
Views: 33132
Reputation: 7191
You can test that the multiprocessor is working fine by doing the following commands:
$ docker run -it --rm ubuntu:20.04
root@somehash:/# apt update && apt install stress
root@somehash:/# stress --cpu 8 # 8 if you have 8 cores
If you have multiple cores, you can test in another terminal the command htop
or top
and you should see all the cores running. If you used htop
you should see something as follows.
If you are at this step. Then everything is working fine.
Furthermore, as I run the script you provided I see my processors being used as they should, you can the the image below. (I also add the process to show it, I run your script inside an ipython
terminal. I also changed from sklearn.externals.joblib.parallel import Parallel, delayed
to from joblib.parallel import Parallel, delayed
because it was not working for me otherwise).
I hope the information provided helps. For other clues, you might like to check your version of docker
.
Upvotes: 4
Reputation: 49
''' Distributed load among several Docker containers using Python multiprocessing capabilities '''
import random
import time
import subprocess
import queue
from multiprocessing import Pool, Queue, Lock
LOCK = Lock()
TEST_QUEUE = Queue()
class TestWorker(object):
''' This Class is executed by each container '''
@staticmethod
def run_test(container_id, value):
''' Operation to be executed for each container '''
cmd = ['docker exec -it {0} echo "I am container {0}!, this is message: {1}"' \
.format(container_id, value)]
process = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE)
for line in process.stdout:
print(line.decode('utf-8')[:-2])
process.wait()
@staticmethod
def container(container_id):
''' Here we get a value from the shared queue '''
while not TEST_QUEUE.empty():
LOCK.acquire()
try:
value = TEST_QUEUE.get(block=False)
time.sleep(0.5)
except queue.Empty:
print("Queue empty ):")
return
print("\nProcessing: {0}\n".format(value))
LOCK.release()
TestWorker.run_test(container_id, value)
def master():
''' Main controller to set containers and test values '''
qty = input("How many containers you want to deploy: ")
msg_val = input("How many random values you want to send among this containers: ")
print("\nGenerating test messages...\n")
for _ in range(int(msg_val)):
item = random.randint(1000, 9999)
TEST_QUEUE.put(item)
ids = []
for _ in range(int(qty)):
container_id = subprocess.run(["docker", "run", "-it", "-d", "centos:7"], \
stdout=subprocess.PIPE)
container_id = container_id.stdout.decode('utf-8')[:-1]
ids.append(container_id)
pool = Pool(int(qty))
pool.map(TestWorker.container, ids)
pool.close()
master()
Upvotes: 2
Reputation: 14264
From https://docs.docker.com/get-started - "Fundamentally, a container is nothing but a running process, with some added encapsulation features applied to it in order to keep it isolated from the host and from other containers."
Docker runs on a host machine. That host machine (or virtual machine) has a certain number of physical (or virtual) CPU's. The reason that multiprocessing.cpu_count()
displays 8 in your case is because that is the number of CPU's your system has. Using docker options like --cpus
or --cpuset-cpus
doesn't change your machine's hardware, which is what cpu_count()
is reporting.
On my current system:
# native
$ python -c 'import multiprocessing as mp; print(mp.cpu_count())'
12
# docker
$ docker run -it --rm --cpus 1 --cpuset-cpus 0 python python -c 'import multiprocessing as mp; print(mp.cpu_count())'
12
From https://docs.docker.com/config/containers/resource_constraints/#cpu - "By default, each container’s access to the host machine’s CPU cycles is unlimited."
But you can limit containers with options like --cpus
or --cpuset-cpus
.
--cpus
can be a floating point number up to the number of physical CPU's available. You can think of this number as a numerator in the fraction <--cpus arg>
/<physical CPU's>
. If you have 8 physical CPU's and you specify --cpus 4
, what you're telling docker is to use no more than 50% (4/8) of your total CPU's. --cpus 1.5
would use 18.75% (1.5/8).
--cpuset-cpus
actually does limit specifically which physical/virtual CPU's to use.
(And there are many other CPU-related options that are covered in docker's documentation.)
Here is a smaller code sample:
import logging
import multiprocessing
import sys
import psutil
from joblib.parallel import Parallel, delayed
def get_logger():
logger = logging.getLogger()
if not logger.hasHandlers():
handler = logging.StreamHandler(sys.stdout)
formatter = logging.Formatter("[%(process)d/%(processName)s] %(message)s")
handler.setFormatter(formatter)
handler.setLevel(logging.DEBUG)
logger.addHandler(handler)
logger.setLevel(logging.DEBUG)
return logger
def fn1(n):
get_logger().debug("fn1(%d); cpu# %d", n, psutil.Process().cpu_num())
if __name__ == "__main__":
get_logger().debug("main")
Parallel(n_jobs=multiprocessing.cpu_count())(delayed(fn1)(n) for n in range(1, 101))
Running this both natively and within docker will log lines such as:
[21/LokyProcess-2] fn1(81); cpu# 11
[28/LokyProcess-9] fn1(82); cpu# 6
[29/LokyProcess-10] fn1(83); cpu# 2
[31/LokyProcess-12] fn1(84); cpu# 0
[22/LokyProcess-3] fn1(85); cpu# 3
[23/LokyProcess-4] fn1(86); cpu# 1
[20/LokyProcess-1] fn1(87); cpu# 7
[25/LokyProcess-6] fn1(88); cpu# 3
[27/LokyProcess-8] fn1(89); cpu# 4
[21/LokyProcess-2] fn1(90); cpu# 9
[28/LokyProcess-9] fn1(91); cpu# 10
[26/LokyProcess-7] fn1(92); cpu# 11
[22/LokyProcess-3] fn1(95); cpu# 9
[29/LokyProcess-10] fn1(93); cpu# 2
[24/LokyProcess-5] fn1(94); cpu# 10
[23/LokyProcess-4] fn1(96); cpu# 1
[20/LokyProcess-1] fn1(97); cpu# 9
[23/LokyProcess-4] fn1(98); cpu# 1
[27/LokyProcess-8] fn1(99); cpu# 4
[21/LokyProcess-2] fn1(100); cpu# 5
Notice that all 12 CPU's are in use on my system. Notice that
Running the same program with docker run --cpus 1 ...
will still result in all 12 CPU's being used by all 12 processes started, just as if the --cpus argument wasn't present. It just limits percentage of total CPU time docker is allowed to use.
Running the same program with docker run --cpusets-cpus 0-1 ...
will result in only 2 physical CPU's being used by all 12 processes started:
[11/LokyProcess-2] fn1(35); cpu# 0
[11/LokyProcess-2] fn1(36); cpu# 0
[12/LokyProcess-3] fn1(37); cpu# 1
[11/LokyProcess-2] fn1(38); cpu# 0
[15/LokyProcess-6] fn1(39); cpu# 1
[17/LokyProcess-8] fn1(40); cpu# 0
[11/LokyProcess-2] fn1(41); cpu# 0
[10/LokyProcess-1] fn1(42); cpu# 1
[11/LokyProcess-2] fn1(43); cpu# 1
[13/LokyProcess-4] fn1(44); cpu# 1
[12/LokyProcess-3] fn1(45); cpu# 0
[12/LokyProcess-3] fn1(46); cpu# 1
To answer the statement "they always take only one physical CPU"-- this is only true if the --cpusets-cpus
arg is exactly/only 1 CPU.
(As a side note-- the reason for logging being set up the way it is in the example is becuase of an open bug in joblib.)
Upvotes: 20
Reputation: 686
Try creating machine from scratch (replace numerical values with desired ones):
docker-machine rm default
docker-machine create -d virtualbox --virtualbox-cpu-count=8 --virtualbox-memory=8192 --virtualbox-disk-size=10000 default
This is just to be on a safe side. And now important part:
Specify cores number before running your image. Following command will use 8 cores.
docker run -it --cpuset-cpus="0-7" your_image_name
And check in docker, if you succeeded not only in python with
nproc
Good luck and let us know how it went 😊 !
Upvotes: 4
Reputation: 339
multiprocessing.cpu_count()
gives 2 on my machine without passing --cpu
option
headover to https://docs.docker.com/engine/admin/resource_constraints/#cpu for more information about docker container resources
Upvotes: 6