Delgan
Delgan

Reputation: 19677

Why is "pickle" and "multiprocessing picklability" so different in Python?

Using Python's multiprocessing on Windows will require many arguments to be "picklable" while passing them to child processes.

import multiprocessing

class Foobar:

   def __getstate__(self):
       print("I'm being pickled!")

def worker(foobar):
   print(foobar)

if __name__ == "__main__":
    # Uncomment this on Linux
    # multiprocessing.set_start_method("spawn")

    foobar = Foobar()
    process = multiprocessing.Process(target=worker, args=(foobar, ))
    process.start()
    process.join()

The documentation mentions this explicitly several times:

Picklability

Ensure that the arguments to the methods of proxies are picklable.

[...]

Better to inherit than pickle/unpickle

When using the spawn or forkserver start methods many types from multiprocessing need to be picklable so that child processes can use them. However, one should generally avoid sending shared objects to other processes using pipes or queues. Instead you should arrange the program so that a process which needs access to a shared resource created elsewhere can inherit it from an ancestor process.

[...]

More picklability

Ensure that all arguments to Process.__init__() are picklable. Also, if you subclass Process then make sure that instances will be picklable when the Process.start method is called.

However, I noticed two main differences between "multiprocessing pickle" and the standard pickle module, and I have trouble making sense of all of this.


multiprocessing.Queue() are not "pickable" yet passable to child processes

import pickle
from multiprocessing import Queue, Process

def worker(queue):
    pass

if __name__ == "__main__":
    queue = Queue()

    # RuntimeError: Queue objects should only be shared between processes through inheritance
    pickle.dumps(queue)

    # Works fine
    process = Process(target=worker, args=(queue, ))
    process.start()
    process.join()
                                                                                                                                                                      

Not picklable if defined in "main"

import pickle
from multiprocessing import Process

def worker(foo):
    pass

if __name__ == "__main__":
    class Foo:
        pass

    foo = Foo()

    # Works fine
    pickle.dumps(foo)

    # AttributeError: Can't get attribute 'Foo' on <module '__mp_main__' from 'C:\\Users\\Delgan\\test.py'>
    process = Process(target=worker, args=(foo, ))
    process.start()
    process.join()

If multiprocessing does not use pickle internally, then what are the inherent differences between these two ways of serializing objects?

Also, what does "inherit" mean in the context of multiprocessing? How am I supposed to prefer it over pickle?

Upvotes: 6

Views: 4207

Answers (1)

Davis Herring
Davis Herring

Reputation: 40013

When a multiprocessing.Queue is passed to a child process, what is actually sent is a file descriptor (or handle) obtained from pipe, which must have been created by the parent before creating the child. The error from pickle is to prevent attempts to send a Queue over another Queue (or similar channel), since it’s too late to use it then. (Unix systems do actually support sending a pipe over certain kinds of socket, but multiprocessing doesn’t use such features.) It’s expected to be “obvious” that certain multiprocessing types can be sent to child processes that would otherwise be useless, so no mention is made of the apparent contradiction.

Since the “spawn” start method can’t create the new process with any Python objects already created, it has to re-import the main script to obtain relevant function/class definitions. It doesn’t set __name__ like the original run for obvious reasons, so anything that is dependent on that setting will not be available. (Here, it is unpickling that failed, which is why your manual pickling works.)

The fork methods start the children with the parent’s objects (at the time of the fork only) still existing; this is what is meant by inheritance.

Upvotes: 7

Related Questions