Luckk
Luckk

Reputation: 538

Why multiprocessing.Process works with inner functions but multiprocessing.Pool does not?

Let's suppose that I have an inner function:

def higher(x):
    def lower(y):
        print(x + y)
    return lower
h = higher(10)

I can successfully run this function on a separate process

from multiprocessing import Process
p = Process(target=h, args=(20,))
p.start()
30

But when I try to run it on a pool of processes I get a pickle error:

from multiprocessing import Pool
if __name__ == '__main__':
    h = higher(10)
    p = Pool(processes=10)
    p.map(h, [1, 2, 3])
AttributeError: Can't pickle local object 'higher.<locals>.lower'

I am wondering why there is this difference of behavior between Process and Pool that I thought to be just a set of processes. Honestly I thought that also the version with Process would have raised the same pickle error. (I noticed that the same happens with lambda functions)

Upvotes: 2

Views: 142

Answers (1)

deseuler
deseuler

Reputation: 419

Process

multiprocessing.Process spawns a subprocess that you have to manage, join(), terminate(), etc. It reinitializes all imports and function definitions that are defined outside of the if __name__ =='__main__' block.

So in the case of Process, the function higher() is defined within the subprocess's scope.

Pool

Pool on the otherhand uses worker processes which receive tasks and submit answers via queues, which require serialization (pickling), source.

So in the case of Pool, the function higher() is being passed to the worker process via a queue. This throws an error with inner and lambda functions, because they are not serializable.

Additional note:

multiprocessing using pickle for serialization. The dill library should be able to serialize inner and lambda functions, unlike pickle.

Upvotes: 2

Related Questions