Reputation: 56230
I'm writing a script that reads a bunch of files, and then processes the rows from all of those files in parallel.
My problem is that the script behaves strangely if it can't open some of the files. If it's one of the later files in the list, then it processes the earlier files, and reports the exception when it gets to the bad file. However, if it can't open one of the first files in the list, then it processes nothing, and doesn't report an exception.
How can I make the script report all exceptions, no matter where they are in the list?
The key problem seems to be the chunk size of pool.imap()
. If the exception occurs before the first chunk is submitted, it fails silently.
Here's a little script to reproduce the problem:
from multiprocessing.pool import Pool
def prepare():
for i in range(5):
yield i+1
raise RuntimeError('foo')
def process(x):
return x
def test(chunk_size):
pool = Pool(10)
n = raised = None
try:
for n in pool.imap(process, prepare(), chunksize=chunk_size):
pass
except RuntimeError as ex:
raised = ex
print(chunk_size, n, raised)
def main():
print('chunksize n raised')
for chunk_size in range(1, 10):
test(chunk_size)
if __name__ == '__main__':
main()
The prepare()
function generates five integers, then raises an exception. That generator gets passed to pool.imap()
with chunk size from 1 to 10. Then it prints out the chunk size, number of results received, and any exception raised.
chunksize n raised
1 5 foo
2 4 foo
3 3 foo
4 4 foo
5 5 foo
6 None None
7 None None
8 None None
9 None None
You can see that the exception is properly reported until the chunk size increases enough that the exception happens before the first chunk is submitted. Then it silently fails, and no results are returned.
Upvotes: 2
Views: 306
Reputation: 489898
If I run this (I modified it slightly for py2k and py3k cross compatibility) with Python 2.7.13 and 3.5.4 on my own handy system, I get:
$ python2 --version
Python 2.7.13
$ python2 mptest.py
chunksize n raised
1 5 foo
2 4 foo
3 3 foo
4 4 foo
5 5 foo
6 None None
7 None None
8 None None
9 None None
$ python3 --version
Python 3.5.4
$ python3 mptest.py
chunksize n raised
1 5 foo
2 4 foo
3 3 foo
4 4 foo
5 5 foo
6 None foo
7 None foo
8 None foo
9 None foo
I presume the fact that it fails (and hence prints None
) for chunk sizes > 5 is not surprising, since no pool process can get six arguments since the generator produced by calling mptest
can only be called 5 times.
What does seem surprising is that Python2.7.9 says None
for the exceptions for chunk sizes above 5, while Python 3.5 says foo
for the exceptions.
This is Issue #28699, fixed in commit 794623bdb2. The fix has apparently been backported to Python 3.5.4, but not to Python 2.7.9, nor apparently to your own Python 3 version.
Upvotes: 1