Reputation: 396
def myfun(a):
return a*2
p=Pool(5)
k0=time.time()
p.map(myfun,[1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10])
k1=time.time()
print(k1-k0)
k0=time.time()
for i in [1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10]:
myfun(i)
k1=time.time()
print(k1-k0)
I am using the multiprocessing package in python. So as you can see i have executed two different snippets of code separately.The first one that uses Pool.map takes more time than the second one which is executed serially. Can anyone explain to me why so? I thought the p.map() would be much faster. Is it not executed parallely?
Upvotes: 0
Views: 119
Reputation: 35207
Indeed as noted in the comments, it takes longer to run in parallel for some tasks with multiprocessing
. This is expected for very small tasks. The reason is that you have to spin up a python instance on each process for each worker used, and you also have to serialize and ship both the function and the data you are sending with map
. This takes some time, so there's an overhead associated with using a multiprocessing.Pool
. For very quick tasks, I suggest multiprocessing.dummy.Pool
, which uses threads -- and thus minimizes setup overhead.
Try putting a time.sleep(x)
in your function call, and varying x
. You'll see that as x
increases, the function becomes more suitable to run in a thread pool, and then in a process pool for even more expensive x
.
Upvotes: 1