user3436624
user3436624

Reputation: 764

pickling and mulitprocessing

The multiprocessing module and pickling.

There seems to be some pickling going on whenever you need to use the multiprocessing module that I'd like to understand better.

Apparently, when items can't be pickled (for whatever reason), they can't be passed as arguments to a Process or Pool object in the multiprocessing module. Why is this?

Is there a complete list or description explaining when items can't be pickled?

Thanks to anyone that can help.

Upvotes: 0

Views: 63

Answers (1)

Mike McKerns
Mike McKerns

Reputation: 35207

So pickle is very limited in what it can serialize. The full list is pretty much given in the docs.. here: https://docs.python.org/3/library/pickle.html#what-can-be-pickled-and-unpickled and here: https://docs.python.org/2/library/pickle.html#what-can-be-pickled-and-unpickled.

It gets worse. Pickling doesn't really work in the interpreter, mainly because pickle primarily serializes by reference. It doesn't actually pickle the function or the class object, it serializes a string that is essentially their name:

>>> import pickle
>>> import math
>>> pickle.dumps(math.sin)
'cmath\nsin\np0\n.'

So, if you have built your function, class, or whatever in the interpreter, then you essentially can't pickle the object with pickle. It looks for the __main__ module, and pickle can't find __main__. This is also why things fail to serialize with multiprocessing in the interpreter.

However, there is a good solution. You could use a better serializer (like dill), and a fork of multiprocessing that leverages a better serializer.

>>> import dill
>>> from pathos.multiprocessing import ProcessingPool as Pool
>>> p = Pool()
>>> 
>>> def squared(x):
...   return x**2
... 
>>> dill.dumps(squared)
'\x80\x02cdill.dill\n_create_function\nq\x00(cdill.dill\n_unmarshal\nq\x01Ufc\x01\x00\x00\x00\x01\x00\x00\x00\x02\x00\x00\x00C\x00\x00\x00s\x08\x00\x00\x00|\x00\x00d\x01\x00\x13S(\x02\x00\x00\x00Ni\x02\x00\x00\x00(\x00\x00\x00\x00(\x01\x00\x00\x00t\x01\x00\x00\x00x(\x00\x00\x00\x00(\x00\x00\x00\x00s\x07\x00\x00\x00<stdin>t\x07\x00\x00\x00squared\x01\x00\x00\x00s\x02\x00\x00\x00\x00\x01q\x02\x85q\x03Rq\x04c__builtin__\n__main__\nU\x07squaredq\x05NN}q\x06tq\x07Rq\x08.'
>>> 
>>> p.map(squared, range(10))
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
>>> 

There's a decent list of sorts for what can serialize and what can't here: https://github.com/uqfoundation/dill/blob/master/dill/_objects.py -- it's not comprehensive, but most things can be serialized with dill.

Get pathos and dill here: https://github.com/uqfoundation

Upvotes: 1

Related Questions