python multiprocessing pool.map not blocking?

Question

I'm trying to parallelize some web requests in python using multiprocessing, but it appears that occasionally, all of the functions I send to map do not complete.

These results appear whether I'm using python 2 or 3.

Test script:

#!/usr/bin/env python

import multiprocessing

def my_print(string):
    print(string)

all_strings = ["alpaca", "bear", "cat", "dog", "elephant", "frog"]

pool = multiprocessing.Pool()
pool.map(my_print, all_strings)

I run it like so:

for i in `seq 1 50`; do ./test.py | wc -l; done | sort | uniq -c

And my results will look like:

6 5
44 6

...so most of the time all 6 executions of the function are running, but occasionally, only 5 of them will run until the overall script completes execution. I expect there to be 50 6 as a result (aka, all functions getting executed on every run).

The documentation https://docs.python.org/2/library/multiprocessing.html#multiprocessing.pool.multiprocessing.Pool.map says It blocks until the result is ready. I assumed that to mean All functions will complete before we move to the next line of code.

Am I misunderstanding that? Does using a pool require you to always call pool.close() and pool.join() to ensure the tasks are complete?

Edit: I'm running on AWS, if that makes any obvious difference - a coworker told me I should mention that.

Thanks very much in advance!

python multiprocessing pool.map not blocking?

Answers (1)

Related Questions