zyxue
zyxue

Reputation: 8908

concurrent.futures.ProcessPoolExecutor hangs when the function is a lambda or nested function

Can anyone provide insight into why using lambda or a nested function (f) would make concurrent.futures.ProcessPoolExecutor hang in the following code example?

import concurrent.futures
​
​
def f2(s):
    return len(s)
​
​
def main():
    def f(s):
        return len(s)
​
    data = ["a", "b", "c"]
​
    with concurrent.futures.ProcessPoolExecutor(max_workers=1) as pool:
        # results = pool.map(f, data) # hangs
        # results = pool.map(lambda d: len(d), data)  # hangs
        # results = pool.map(len, data)  # works
        results = pool.map(f2, data) # works
​
    print(list(results))
​
​
if __name__ == "__main__":
    main()

Upvotes: 3

Views: 6440

Answers (1)

Dr. Regex
Dr. Regex

Reputation: 166

Long story short, Pool/ProcessPoolExecutor both must serialize everything before sending them to the workers. Serializing (also sometimes called pickling) actually is the process in which the name of a function is saved, to only be imported again once Pool wants to have access to it. For this process to work, the function has to be defined at the top level since nested functions are not importable by the child which is the reason for the following error to show up:

AttributeError: Can't pickle local object 'MyClass.mymethod.<locals>.mymethod'

To avoid this problem, there are some solutions out there that I've not found reliable. If you're flexible on using other packages, pathos is an alternative that actually works. For instance, the following won't hang:

import pathos
import os

class SomeClass:

    def __init__(self):
         self.words = ["a", "b", "c"]

    def some_method(self):
    
        def run(s):
            return len(s)
    
        return list(pool.map(run, self.words))

pool = pathos.multiprocessing.Pool(os.cpu_count())
print(SomeClass().some_method())

and it will indeed print

[1, 1, 1]

Upvotes: 6

Related Questions