Avraham
Avraham

Reputation: 71

Python concurrent.futures: ProcessPoolExecutor fail to work

I'm trying to use the ProcessPoolExecutor method but it fails. Here is an example(calculation of the big divider of two numbers) of a failed use. I don't understand what the mistake is

def gcd(pair):
    a, b = pair
    low = min(a, b)
    for i in range(low, 0, -1):
        if a % i == 0 and b % i == 0:
            return i

numbers = [(1963309, 2265973), (2030677, 3814172),
           (1551645, 2229620), (2039045, 2020802)]
start = time()
pool = ProcessPoolExecutor(max_workers=2)
results = list(pool.map(gcd, numbers))
end = time()
print('Took %.3f seconds' % (end - start))

BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.

Upvotes: 5

Views: 5758

Answers (1)

Iguananaut
Iguananaut

Reputation: 23306

Change your code to look like this, and it will work:

from time import time
from concurrent.futures import ProcessPoolExecutor
def gcd(pair):
    a, b = pair
    low = min(a, b)
    for i in range(low, 0, -1):
        if a % i == 0 and b % i == 0:
            return i

numbers = [(1963309, 2265973), (2030677, 3814172),
           (1551645, 2229620), (2039045, 2020802)]

def main():
    start = time()
    pool = ProcessPoolExecutor(max_workers=3)
    results = list(pool.map(gcd, numbers))
    end = time()
    print('Took %.3f seconds' % (end - start))


if __name__ == '__main__':
    main()

On systems that support fork() this isn't necessary, because your script is imported just once, and then each process started ProcessPoolExecutor will already have a copy of objects in your global namespace like the gcd function. Once they're forked, they go through a bootstrap process whereby they start running their target function (in this case a worker process loop that accepts jobs from the process pool executor) and they never return to the original code in your main module from which they were forked.

By contrast, if you're using the spawn-based processes which are the default on Windows and OSX, a new process must be started up from scratch for each worker process, and if they must re-import your module. However, if your module does something like ProcessPoolExecutor directly in the module body, without guarding it like if __name__ == '__main__':, then there is no way for them to import your module without also starting a new ProcessPoolExecutor. So this error you're getting is essentially guarding you from creating an infinite process bomb.

This is mentioned in the docs for ProcessPoolExecutor:

The __main__ module must be importable by worker subprocesses. This means that ProcessPoolExecutor will not work in the interactive interpreter.

But they don't really make clear why that is or what it means for the __main__ module to be "importable". When you write a simple script in Python and run it like python foo.py, your script foo.py is loaded with a module name of __main__, as opposed to a module named foo which you get if you import foo. For it to be "importable" in this case really means, importable without major side-effects like spawning new processes.

Upvotes: 5

Related Questions