Reputation: 538
Let's suppose that I have an inner function:
def higher(x):
def lower(y):
print(x + y)
return lower
h = higher(10)
I can successfully run this function on a separate process
from multiprocessing import Process
p = Process(target=h, args=(20,))
p.start()
30
But when I try to run it on a pool of processes I get a pickle error:
from multiprocessing import Pool
if __name__ == '__main__':
h = higher(10)
p = Pool(processes=10)
p.map(h, [1, 2, 3])
AttributeError: Can't pickle local object 'higher.<locals>.lower'
I am wondering why there is this difference of behavior between Process and Pool that I thought to be just a set of processes. Honestly I thought that also the version with Process would have raised the same pickle error. (I noticed that the same happens with lambda functions)
Upvotes: 2
Views: 142
Reputation: 419
multiprocessing.Process
spawns a subprocess that you have to manage, join()
, terminate()
, etc. It reinitializes all imports and function definitions that are defined outside of the if __name__ =='__main__'
block.
So in the case of Process
, the function higher()
is defined within the subprocess's scope.
Pool
on the otherhand uses worker processes which receive tasks and submit answers via queues, which require serialization (pickling), source.
So in the case of Pool
, the function higher()
is being passed to the worker process via a queue. This throws an error with inner and lambda functions, because they are not serializable.
multiprocessing
using pickle
for serialization. The dill
library should be able to serialize inner and lambda functions, unlike pickle
.
Upvotes: 2