Reputation: 395
I write a python code for Q-learning algorithm and I have to run it multiple times since this algorithm has random output. Thus I use multiprocessing
module. The structure of the code is as follows
import numpy as np
import scipy as sp
import multiprocessing as mp
# ...import other modules...
# ...define some parameters here...
# using multiprocessing
result = []
num_threads = 3
pool = mp.Pool(num_threads)
for cnt in range(num_threads):
args = (RL_params+phys_params) # arguments
result.append(pool.apply_async(Q_learning, args))
pool.close()
pool.join()
There is no I/O operation in my code and my work station has 6 cores (12 threads) and enough memory for this job. When I run the code with num_threads=1
, it takes me only 13 seconds and this mission only occupies 1 thread with CPU usage 100% (using top
command).
click to see picture of CPU status
However, if I run it with num_threads=3
(or more), it shall takes more than 40 seconds and this mission will occupy 3 threads with each thread use 100% CPU core.
click to see picture of CPU status
I can't understand this slowing down because there is no parallelization in all self-defined functions and no I/O operation. It is also interesting to notice that when num_threads=1
, CPU usage is always less than 100%, but when num_threads
is larger than 1, CPU usage may sometimes be 101% or 102%.
On the other hand, I wrote another simple test file which does not import numpy and scipy, then this problem never show. I have noticed this question why isn't numpy.mean multithreaded? and it seem my problem is due to the automatic parallelization of some methods in numpy
(such dot
). But as I shown in the pictures, I can't see any parallelization when I run a single job.
Upvotes: 1
Views: 2304
Reputation: 18625
When you use a multiprocessing pool, all the arguments and results get sent through pickle
. This can be very processor-intensive and time-consuming. That could be the source of your problem, especially if your arguments and/or results are large. In those cases, Python may spend more time pickling and unpickling the data than it does running computations.
However, numpy
releases the global interpreter lock during computations, so if your work is numpy-intensive, you may be able to speed it up by using threading instead of multiprocessing. That would avoid the pickling step. See here for more details: https://stackoverflow.com/a/38775513/3830997
Upvotes: 3