Hoopes
Hoopes

Reputation: 4175

python multiprocessing pool.map not blocking?

I'm trying to parallelize some web requests in python using multiprocessing, but it appears that occasionally, all of the functions I send to map do not complete.

These results appear whether I'm using python 2 or 3.

Test script:

#!/usr/bin/env python

import multiprocessing

def my_print(string):
    print(string)

all_strings = ["alpaca", "bear", "cat", "dog", "elephant", "frog"]

pool = multiprocessing.Pool()
pool.map(my_print, all_strings)

I run it like so:

for i in `seq 1 50`; do ./test.py | wc -l; done | sort | uniq -c

And my results will look like:

6 5
44 6

...so most of the time all 6 executions of the function are running, but occasionally, only 5 of them will run until the overall script completes execution. I expect there to be 50 6 as a result (aka, all functions getting executed on every run).

The documentation https://docs.python.org/2/library/multiprocessing.html#multiprocessing.pool.multiprocessing.Pool.map says It blocks until the result is ready. I assumed that to mean All functions will complete before we move to the next line of code.

Am I misunderstanding that? Does using a pool require you to always call pool.close() and pool.join() to ensure the tasks are complete?

Edit: I'm running on AWS, if that makes any obvious difference - a coworker told me I should mention that.

Thanks very much in advance!

Upvotes: 3

Views: 3178

Answers (1)

Roland Smith
Roland Smith

Reputation: 43533

All workers run their functions and return any values before map returns. That is true. But that doesn't mean you will see all strings immediately.

You have multiple worker processes trying to write to the same file/terminal. To make that work you might have to import sys and call sys.stdout.flush() after every print() in the worker process.

Upvotes: 2

Related Questions