Don Kirkby
Don Kirkby

Reputation: 56230

Python multiprocessing pool swallows exception from first chunk's input

I'm writing a script that reads a bunch of files, and then processes the rows from all of those files in parallel.

My problem is that the script behaves strangely if it can't open some of the files. If it's one of the later files in the list, then it processes the earlier files, and reports the exception when it gets to the bad file. However, if it can't open one of the first files in the list, then it processes nothing, and doesn't report an exception.

How can I make the script report all exceptions, no matter where they are in the list?

The key problem seems to be the chunk size of pool.imap(). If the exception occurs before the first chunk is submitted, it fails silently.

Here's a little script to reproduce the problem:

from multiprocessing.pool import Pool


def prepare():
    for i in range(5):
        yield i+1

    raise RuntimeError('foo')


def process(x):
    return x


def test(chunk_size):
    pool = Pool(10)
    n = raised = None
    try:
        for n in pool.imap(process, prepare(), chunksize=chunk_size):
            pass
    except RuntimeError as ex:
        raised = ex
    print(chunk_size, n, raised)


def main():
    print('chunksize n raised')
    for chunk_size in range(1, 10):
        test(chunk_size)


if __name__ == '__main__':
    main()

The prepare() function generates five integers, then raises an exception. That generator gets passed to pool.imap() with chunk size from 1 to 10. Then it prints out the chunk size, number of results received, and any exception raised.

chunksize n raised
1 5 foo
2 4 foo
3 3 foo
4 4 foo
5 5 foo
6 None None
7 None None
8 None None
9 None None

You can see that the exception is properly reported until the chunk size increases enough that the exception happens before the first chunk is submitted. Then it silently fails, and no results are returned.

Upvotes: 2

Views: 306

Answers (1)

torek
torek

Reputation: 489898

If I run this (I modified it slightly for py2k and py3k cross compatibility) with Python 2.7.13 and 3.5.4 on my own handy system, I get:

$ python2 --version
Python 2.7.13
$ python2 mptest.py
chunksize    n raised
        1    5 foo
        2    4 foo
        3    3 foo
        4    4 foo
        5    5 foo
        6 None None
        7 None None
        8 None None
        9 None None
$ python3 --version
Python 3.5.4
$ python3 mptest.py
chunksize    n raised
        1    5 foo
        2    4 foo
        3    3 foo
        4    4 foo
        5    5 foo
        6 None foo
        7 None foo
        8 None foo
        9 None foo

I presume the fact that it fails (and hence prints None) for chunk sizes > 5 is not surprising, since no pool process can get six arguments since the generator produced by calling mptest can only be called 5 times.

What does seem surprising is that Python2.7.9 says None for the exceptions for chunk sizes above 5, while Python 3.5 says foo for the exceptions.

This is Issue #28699, fixed in commit 794623bdb2. The fix has apparently been backported to Python 3.5.4, but not to Python 2.7.9, nor apparently to your own Python 3 version.

Upvotes: 1

Related Questions