Reputation: 19677
Using Python's multiprocessing
on Windows will require many arguments to be "picklable" while passing them to child processes.
import multiprocessing
class Foobar:
def __getstate__(self):
print("I'm being pickled!")
def worker(foobar):
print(foobar)
if __name__ == "__main__":
# Uncomment this on Linux
# multiprocessing.set_start_method("spawn")
foobar = Foobar()
process = multiprocessing.Process(target=worker, args=(foobar, ))
process.start()
process.join()
The documentation mentions this explicitly several times:
Picklability
Ensure that the arguments to the methods of proxies are picklable.
[...]
Better to inherit than pickle/unpickle
When using the spawn or forkserver start methods many types from
multiprocessing
need to be picklable so that child processes can use them. However, one should generally avoid sending shared objects to other processes using pipes or queues. Instead you should arrange the program so that a process which needs access to a shared resource created elsewhere can inherit it from an ancestor process.[...]
More picklability
Ensure that all arguments to
Process.__init__()
are picklable. Also, if you subclassProcess
then make sure that instances will be picklable when theProcess.start
method is called.
However, I noticed two main differences between "multiprocessing
pickle" and the standard pickle
module, and I have trouble making sense of all of this.
multiprocessing.Queue()
are not "pickable" yet passable to child processesimport pickle
from multiprocessing import Queue, Process
def worker(queue):
pass
if __name__ == "__main__":
queue = Queue()
# RuntimeError: Queue objects should only be shared between processes through inheritance
pickle.dumps(queue)
# Works fine
process = Process(target=worker, args=(queue, ))
process.start()
process.join()
import pickle
from multiprocessing import Process
def worker(foo):
pass
if __name__ == "__main__":
class Foo:
pass
foo = Foo()
# Works fine
pickle.dumps(foo)
# AttributeError: Can't get attribute 'Foo' on <module '__mp_main__' from 'C:\\Users\\Delgan\\test.py'>
process = Process(target=worker, args=(foo, ))
process.start()
process.join()
If multiprocessing
does not use pickle
internally, then what are the inherent differences between these two ways of serializing objects?
Also, what does "inherit" mean in the context of multiprocessing? How am I supposed to prefer it over pickle?
Upvotes: 6
Views: 4207
Reputation: 40013
When a multiprocessing.Queue
is passed to a child process, what is actually sent is a file descriptor (or handle) obtained from pipe
, which must have been created by the parent before creating the child. The error from pickle
is to prevent attempts to send a Queue
over another Queue
(or similar channel), since it’s too late to use it then. (Unix systems do actually support sending a pipe over certain kinds of socket, but multiprocessing
doesn’t use such features.) It’s expected to be “obvious” that certain multiprocessing
types can be sent to child processes that would otherwise be useless, so no mention is made of the apparent contradiction.
Since the “spawn” start method can’t create the new process with any Python objects already created, it has to re-import the main script to obtain relevant function/class definitions. It doesn’t set __name__
like the original run for obvious reasons, so anything that is dependent on that setting will not be available. (Here, it is unpickling that failed, which is why your manual pickling works.)
The fork methods start the children with the parent’s objects (at the time of the fork only) still existing; this is what is meant by inheritance.
Upvotes: 7