Reputation: 8908
Can anyone provide insight into why using lambda or a nested function (f
) would make concurrent.futures.ProcessPoolExecutor
hang in the following code example?
import concurrent.futures
def f2(s):
return len(s)
def main():
def f(s):
return len(s)
data = ["a", "b", "c"]
with concurrent.futures.ProcessPoolExecutor(max_workers=1) as pool:
# results = pool.map(f, data) # hangs
# results = pool.map(lambda d: len(d), data) # hangs
# results = pool.map(len, data) # works
results = pool.map(f2, data) # works
print(list(results))
if __name__ == "__main__":
main()
Upvotes: 3
Views: 6440
Reputation: 166
Long story short, Pool/ProcessPoolExecutor both must serialize everything before sending them to the workers. Serializing (also sometimes called pickling) actually is the process in which the name of a function is saved, to only be imported again once Pool wants to have access to it. For this process to work, the function has to be defined at the top level since nested functions are not importable by the child which is the reason for the following error to show up:
AttributeError: Can't pickle local object 'MyClass.mymethod.<locals>.mymethod'
To avoid this problem, there are some solutions out there that I've not found reliable. If you're flexible on using other packages, pathos is an alternative that actually works. For instance, the following won't hang:
import pathos
import os
class SomeClass:
def __init__(self):
self.words = ["a", "b", "c"]
def some_method(self):
def run(s):
return len(s)
return list(pool.map(run, self.words))
pool = pathos.multiprocessing.Pool(os.cpu_count())
print(SomeClass().some_method())
and it will indeed print
[1, 1, 1]
Upvotes: 6