Reputation: 1
I am having problems with reaping benefits of multiprocessing in python. Basically, the computational time increases with every extra added core. So my guess is that it's due to overhead but I am not sure what exactly I am doing wrong and how it could be improved/overcome. My real problem is a bit more complex but I have prepared a simpler example to shed some light on my troubles.
Brief description:
I have a list of objects which are independent of each other. For every object I need to call a function which takes a dictionary of other objects as an input. Then it evaluates multiple times for each object in the initial list. So it's basically a loop within a loop.
Below is a code with a very simplified version of the problem
import time
import multiprocessing as mp
import numpy as np
def test_fun(iter_nr, mp_dict, i, return_dict):
dumm = 0
for j in range(iter_nr):
for val in mp_dict.values():
dumm += val
return_dict[i] = dumm
manager = mp.Manager()
return_dict = manager.dict()
mp_dict = manager.dict()
for i in range(100):
mp_dict[str(i)] = 1
nproc = [2,4,6,8,10,12,16,20]
nr_iter = 2*4*6*8*10
jobs = []
print('Total number of iterations: ', nr_iter)
if __name__ == '__main__':
for n_proc in nproc:
nr_iter_array = (nr_iter / n_proc ) * np.ones(n_proc)
print('Nr CPUs: ', n_proc)
print('Nr iterations per process: ', int(nr_iter_array[0]))
start_time = time.time()
for i in range(n_proc):
p = mp.Process(target = test_fun, args = (int(nr_iter_array[i]), mp_dict, i, return_dict))
p.start()
jobs += [p]
for job in jobs:
job.join()
end_time = time.time()
print(round(end_time - start_time, 3), 'sec')
And here is the output
Total number of iterations: 3840
Nr CPUs: 2
Nr iterations per process: 1920
0.661 sec
Nr CPUs: 4
Nr iterations per process: 960
1.385 sec
Nr CPUs: 6
Nr iterations per process: 640
1.674 sec
Nr CPUs: 8
Nr iterations per process: 480
1.524 sec
Nr CPUs: 10
Nr iterations per process: 384
1.992 sec
Nr CPUs: 12
Nr iterations per process: 320
2.072 sec
Nr CPUs: 16
Nr iterations per process: 240
2.186 sec
Nr CPUs: 20
Nr iterations per process: 192
2.607 sec
As you can see the computation time increases with the number of cores. That is not something I would expect. Does anyone have any idea what is going on here and how to overcome this?
Upvotes: 0
Views: 1260
Reputation: 12205
This is a case of process creation overhead. Your tasks are too small for there to be any benefit from multiprocessing.
When I ran your code, I got the same result as you did. However, when I started tweaking
for i in range(100):
mp_dict[str(i)] = 1
this part and changed the range to 1000, there was some multiprocessing benefit (I am running on a laptop with a limited number of cores, your results might vary)
Total number of iterations: 3840
Nr CPUs: 2
Nr iterations per process: 1920
0.237 sec
Nr CPUs: 4
Nr iterations per process: 960
0.216 sec
Nr CPUs: 6
Nr iterations per process: 640
0.221 sec
Nr CPUs: 8
Nr iterations per process: 480
0.224 sec
Nr CPUs: 10
Nr iterations per process: 384
0.233 sec
Nr CPUs: 12
Nr iterations per process: 320
0.231 sec
Nr CPUs: 16
Nr iterations per process: 240
0.243 sec
Nr CPUs: 20
Nr iterations per process: 192
0.255 sec
When I changed it to 10000, the improvement result was there again
Total number of iterations: 3840
Nr CPUs: 2
Nr iterations per process: 1920
1.578 sec
Nr CPUs: 4
Nr iterations per process: 960
1.063 sec
Nr CPUs: 6
Nr iterations per process: 640
1.076 sec
Nr CPUs: 8
Nr iterations per process: 480
1.083 sec
Nr CPUs: 10
Nr iterations per process: 384
1.098 sec
Nr CPUs: 12
Nr iterations per process: 320
1.08 sec
Nr CPUs: 16
Nr iterations per process: 240
1.099 sec
Nr CPUs: 20
Nr iterations per process: 192
1.122 sec
A tweak I did not try but which probably will change numbers again is to replace your self-managed Processes with a Pool from multiprocessing. This example probably works nicely with a Pool. A Pool launches N processes only once and then keeps utilising them and sending them more work as soon as a worker is free. You spawn and kill a lot of processes while a pool does that only once.
This is not a miracle cure, though. My favourite multiprocessing bugbear is data transmission. Multiprocessing and pools do that in queues, which are very slow. Managers use the same queues and are slow as well. The more data you transmit in or out of your workers, the more time you will spend doing this. It is often better to re-engineer code so that it does a start to finish task in workers instead of sending a lot in/out.
Anyway, the conclusion is it always depends on your case. Multiprocessing does not always improve performance and even if it does, improvements can be modest. It needs a bit of assessing as you have done to find out where the optimal spot is.
This is a bit of a non-answer but it is too long for comments.
Upvotes: 1