How to do efficient multiprocessing?

Question

I am using multiprocessing.Pool.map to run my code in parallel in my workstation which has got 10 physical cores (20 cores if I include the logical ones also).

To summarize my code, I have to do some calculations with 2080 matrices. So,I divide 2080 matrices to 130 groups each containing 16 matrices.

The calculation of these 16 matrices are then distributed over 16 cores (Should I use only 10 since I have only 10 physical cores?) using multiprocessing.Pool.map.

My questions are:

(1) When I monitor the usage of CPU in 'system monitor' in Ubuntu, I find many a times only 1 CPU usage is showing 100% instead of 16 CPU's showing 100% of usage. 16 CPU's show 100% usage only for short duration. Why does this happen? How to improve it?

(2) Will I be able to improve the calculation time by dividing 2080 matrices into 104 groups each having 20 matrices and then distribute the calculation of these 20 matrices over 10 or 16 cores?

My code snippet is as below:

def f(k):
    adj=np.zeros((9045,9045),dtype='bool')
    # Calculate the elements of the  matrices
    return adj

n_CPU=16
n_networks_window=16
window=int(2080/n_networks_window) #Dividing 2080 matrices into 130 segments having 16 matrices each  
for i in range(window):
    range_window=range(int(i*2080/window),int((i+1)*2080/window))
    p=Pool(processes=n_CPU)
    adj=p.map(f,range_window)
    p.close()
    p.join()
    for k in range_window:
    # Some calculations using adj
np.savetxt(') # saving the output as a txt file

Any help will be really useful as I am first time parallelizing a python code.

Thank you.

EDIT: I tried the following chnages in the code and it is working fine now: pool.imap_unordered(f,range(2080),chunksize=260)

Andrea Corbellini · Accepted Answer

Your problem is here:

for i in range(window):
    # [snip]
    p=Pool(processes=n_CPU)
    adj=p.map(f,range_window)
    p.close()
    p.join()
    # [snip]

You're creating a new Pool at every loop and submitting only a few jobs to it. In order for the loop to continue, the few jobs have to complete before more jobs can be executed. In other words, you're not using parallelism at its full potential.

What you should do is create a single Pool, submit all the jobs, and then, out of the loop, join:

p=Pool(processes=n_CPU)

for i in range(window):
    # [snip]
    p.map_async(f,range_window)
    # [snip]

p.close()
p.join()

Note the use of map_async instead of map: this is, again, to avoid waiting for a small portion of jobs to complete before submitting new jobs.

An even better approach is to call map/map_async only once, constructing a single range object and avoiding the for-loop:

with Pool(processes=n_CPU) as p:
    p.map(f, range(2080))  # This will block, but it's okay: we are
                           # submitting all the jobs at once

As for your question about the number of CPUs to use, first of all note that Pool will use all the CPUs available (as returned by os.cpu_count() by default if you don't specify the processes argument -- give it a try.

It's not clear to me what you mean by having 10 physical cores and 20 logical ones. If you're talking about hyperthreading, then it's fine: use them all. If instead you're saying that you're using a virtual machine with more virtual CPUs than the host CPUs, then using 20 instead of 10 won't make much difference.

How to do efficient multiprocessing?

Answers (1)

Related Questions