Reputation: 100
I have been working with the multiprocessing module trying to parallelise a for loop that takes 27 min to run on a single core. I have 12 CPU cores at my disposal.
The meat of the code that I am using is given below for parallelisation is given below:
import multiprocessing as mp
def Parallel_Work(val,b,c):
# Filter basis val to make a dataframe and do some work on it
# ...
values = pd.Series( [ "CompanyA",
"CompanyB",
"CompanyC", ] ) # Actual values list is quite big
with mp.Pool(processes=4) as pool:
results= [pool.apply( Parallel_Work,
args = ( val, b, c )
) for val in values.unique() ]
When I run this code, I have run across two things that I haven't been able to figure out
None of the processes run at maximum 100% CPU usage. In fact the combined CPU Usage of all processes sums up to 100% every time (link to screenshot attached). Are the processes really using different cores? If not, how do I make sure they do that.
results of "top" command
There are 4 processes spawned, however only 2 are active at any given point of time. Am I missing something here?
Please let me know if I can provide any more information.
Upvotes: 0
Views: 500
Reputation: 1755
I think you need to be using apply_async
instead of apply
which blocks until the result is ready.
See this SO question for details on apply
, apply_async
and map
Upvotes: 1