Chau Pham
Chau Pham

Reputation: 5070

Pool.map vs Pool.map_async

I have the program like this:

from multiprocessing import Pool
import time

def f(x):
# I make a heavy code here to take time
  for i in range(10000):
     for i in range(10000):
        pass #do nothing

  print x #print x

if __name__ == '__main__':
  pool = Pool(processes=4)

  pool.map(f, range(10))
  r  = pool.map_async(f, range(10))

  # DO STUFF
  print 'HERE'
  print 'MORE'
  r.wait()
  print 'Done'

As far as I know, pool.map will return in order, whereas pool.map_async will not. I tried to figure out the difference between them, but I haven't got it yet.

Accutually, I have read some posts, for example: Python multiprocessing : map vs map_async

but I'm still confused. My questions are:

  1. How difference between the 2 functions ?
  2. When I run the code above, I got this:

1 3 2 0 4 6 5 7 8 9 HERE MORE 1 0 3 2 5 4 6 7 8 9 Done

I expect pool.map will return the output in order, but It didn't ! So, why didn't It return in order ? Or I misunderstood the function ?

  1. I think when the pool.map was called, the main (the following code, like
    r = pool.map_async(f, range(10)); print 'HERE'; print 'MORE' is continue running. So I expect "Here" and "More" is printed between the numbers, I mean something like

3 2 0 4 6 HERE 5 7 8 9 1 0 3 2 MORE 5 4 6 7 8 9 Done

But It happened in other way. Why doesn't It run as I expect ?

  1. If I comment the heavy code, the f function now just is:

    def f(x): print x

then both the functions will return the output in order (I tried run a lot of times, It always print the same result. So, Why does It behave differently when It does/doesn't have the heavy code.

Any help would be appreciated. Thank you.

Upvotes: 2

Views: 7068

Answers (1)

l4l1lu
l4l1lu

Reputation: 31

from multiprocessing import Pool
import time

def f(x):
# I make a heavy code here to take time
  for i in range(10000):
     for i in range(10000):
        pass #do nothing
  print x
  return x 

if __name__ == '__main__':
  pool = Pool(processes=4)

  print pool.map(f, range(10))
  r  = pool.map_async(f, range(10))

  # DO STUFF
  print 'HERE'
  print 'MORE'
  r.wait()
  print 'Done'
  print r.get()
  1. pool.map_async will not block your script, whereas pool.map will (as mentioned by quikst3r).
  2. I slightly adapted your script to be more illustrative. As you can see the final results are both in order, except that after starting pool.map_async subsequent code is executed as well. The output is:

    1
    3
    0
    2
    4
    5
    7
    6
    8
    9
    [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
    HERE
    MORE
    3
    2
    1
    0
    5
    4
    6
    7
    8
    9
    Done
    [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
    

Whereas the print order is hardly determined due to overhead in the job distribution and individual load of your cpus.

Upvotes: 3

Related Questions