Reputation: 8788
Say I do this:
import multiprocessing as mp
def f(x):
print x
raise OverflowError
if __name__ == '__main__':
pool = mp.Pool(processes=1)
pool.map(f, range(10))
pool.close()
pool.join()
out:
0
3
6
9
Traceback (most recent call last):
File "test1.py", line 9, in <module>
pool.map(f, range(10))
File "/Users/usualme/anaconda/lib/python2.7/multiprocessing/pool.py", line 251, in map
return self.map_async(func, iterable, chunksize).get()
File "/Users/usualme/anaconda/lib/python2.7/multiprocessing/pool.py", line 558, in get
raise self._value
OverflowError
Now I replace map
by imap_unordered
:
import multiprocessing as mp
def f(x):
print x
raise OverflowError
if __name__ == '__main__':
pool = mp.Pool(processes=1)
for _ in pool.imap_unordered(f, range(10)):
pass
pool.close()
pool.join()
out:
0
1
2
3
4
5
Traceback (most recent call last):
File "test0.py", line 9, in <module>
6
7
for _ in pool.imap_unordered(f, range(10)):
8
File "/Users/usualme/anaconda/lib/python2.7/multiprocessing/pool.py", line 659, in next
9
raise value
OverflowError
My questions:
map
: why is it jumping from 0 to 3, 6, and 9?imap_unordered
: why does it go all the way to 9 this time? What's different?Upvotes: 2
Views: 2411
Reputation: 23332
map
,imap
, and imap_unordered
process data in chunks. By their nature, they are prepared to submit those chunks to multiple processes in parallel.
for imap_unordered: why does it go all the way to 9 this time? What's different?
For imap
(and presumably imap_unordered
), the default chunksize is 1. Thus, f
will actually start to execute for all values.
You can check this behaviour by passing the chunksize
argument. If you provide chunksize=1
to map
, you'll get behaviour similar to your other example.
for map: why is it jumping from 0 to 3, 6, and 9?
Although it's not mentioned in the documentation, it seems map
's default chunksize
is "smarter". Here, it looks like the chunk size was 3, so the chunks will be [[0,1,2],[3,4,5],[6,7,8],[9]]
.
I'm not sure why all this still happens when you only have one process, but my guess is that the implementation agglomerates all results before checking them. Exceptions, IIRC, are caught in the child process and serialized over IPC - they're a result as any other.
Upvotes: 2