usual me
usual me

Reputation: 8788

Differences between pool.map and pool.imap_unordered when the worker throws an exception

Say I do this:

import multiprocessing as mp

def f(x):
    print x
    raise OverflowError 

if __name__ == '__main__':
    pool = mp.Pool(processes=1)
    pool.map(f, range(10))
    pool.close()
    pool.join()

out:

0
3
6
9
Traceback (most recent call last):
  File "test1.py", line 9, in <module>
    pool.map(f, range(10))
  File "/Users/usualme/anaconda/lib/python2.7/multiprocessing/pool.py", line 251, in map
    return self.map_async(func, iterable, chunksize).get()
  File "/Users/usualme/anaconda/lib/python2.7/multiprocessing/pool.py", line 558, in get
    raise self._value
OverflowError

Now I replace map by imap_unordered:

import multiprocessing as mp

def f(x):
    print x
    raise OverflowError 

if __name__ == '__main__':
    pool = mp.Pool(processes=1)
    for _ in pool.imap_unordered(f, range(10)):
        pass
    pool.close()
    pool.join()

out:

0
1
2
3
4
5
Traceback (most recent call last):
  File "test0.py", line 9, in <module>
6
7
    for _ in pool.imap_unordered(f, range(10)):
8
  File "/Users/usualme/anaconda/lib/python2.7/multiprocessing/pool.py", line 659, in next
9
    raise value
OverflowError

My questions:

Upvotes: 2

Views: 2411

Answers (1)

loopbackbee
loopbackbee

Reputation: 23332

map,imap, and imap_unordered process data in chunks. By their nature, they are prepared to submit those chunks to multiple processes in parallel.

for imap_unordered: why does it go all the way to 9 this time? What's different?

For imap (and presumably imap_unordered), the default chunksize is 1. Thus, f will actually start to execute for all values.

You can check this behaviour by passing the chunksize argument. If you provide chunksize=1 to map, you'll get behaviour similar to your other example.

for map: why is it jumping from 0 to 3, 6, and 9?

Although it's not mentioned in the documentation, it seems map's default chunksize is "smarter". Here, it looks like the chunk size was 3, so the chunks will be [[0,1,2],[3,4,5],[6,7,8],[9]].

I'm not sure why all this still happens when you only have one process, but my guess is that the implementation agglomerates all results before checking them. Exceptions, IIRC, are caught in the child process and serialized over IPC - they're a result as any other.

Upvotes: 2

Related Questions