Reputation: 764
The multiprocessing module and pickling.
There seems to be some pickling going on whenever you need to use the multiprocessing module that I'd like to understand better.
Apparently, when items can't be pickled (for whatever reason), they can't be passed as arguments to a Process or Pool object in the multiprocessing module. Why is this?
Is there a complete list or description explaining when items can't be pickled?
Thanks to anyone that can help.
Upvotes: 0
Views: 63
Reputation: 35207
So pickle
is very limited in what it can serialize. The full list is pretty much given in the docs.. here:
https://docs.python.org/3/library/pickle.html#what-can-be-pickled-and-unpickled
and here:
https://docs.python.org/2/library/pickle.html#what-can-be-pickled-and-unpickled.
It gets worse. Pickling doesn't really work in the interpreter, mainly because pickle
primarily serializes by reference. It doesn't actually pickle the function or the class object, it serializes a string that is essentially their name:
>>> import pickle
>>> import math
>>> pickle.dumps(math.sin)
'cmath\nsin\np0\n.'
So, if you have built your function, class, or whatever in the interpreter, then you essentially can't pickle the object with pickle
. It looks for the __main__
module, and pickle
can't find __main__
. This is also why things fail to serialize with multiprocessing
in the interpreter.
However, there is a good solution. You could use a better serializer (like dill
), and a fork of multiprocessing
that leverages a better serializer.
>>> import dill
>>> from pathos.multiprocessing import ProcessingPool as Pool
>>> p = Pool()
>>>
>>> def squared(x):
... return x**2
...
>>> dill.dumps(squared)
'\x80\x02cdill.dill\n_create_function\nq\x00(cdill.dill\n_unmarshal\nq\x01Ufc\x01\x00\x00\x00\x01\x00\x00\x00\x02\x00\x00\x00C\x00\x00\x00s\x08\x00\x00\x00|\x00\x00d\x01\x00\x13S(\x02\x00\x00\x00Ni\x02\x00\x00\x00(\x00\x00\x00\x00(\x01\x00\x00\x00t\x01\x00\x00\x00x(\x00\x00\x00\x00(\x00\x00\x00\x00s\x07\x00\x00\x00<stdin>t\x07\x00\x00\x00squared\x01\x00\x00\x00s\x02\x00\x00\x00\x00\x01q\x02\x85q\x03Rq\x04c__builtin__\n__main__\nU\x07squaredq\x05NN}q\x06tq\x07Rq\x08.'
>>>
>>> p.map(squared, range(10))
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
>>>
There's a decent list of sorts for what can serialize and what can't here:
https://github.com/uqfoundation/dill/blob/master/dill/_objects.py
-- it's not comprehensive, but most things can be serialized with dill
.
Get pathos
and dill
here: https://github.com/uqfoundation
Upvotes: 1