Reputation: 16508
I have a list of calculations I need to run. I'm parallelizing them using
from pathos.multiprocessing import ProcessingPool
pool = ProcessingPool(nodes=7)
values = pool.map(helperFunction, someArgs)
helperFunction
does create a class called Parameters
, which is defined in the same file as
import otherModule
class Parameters(otherModule.Parameters):
...
So far, so good. helperFunction
will do some calculations, based on the Parameters
object, change some of its attributes, and finally store them using pickle
. Here's the relevant excerpt of the helper function (from a different module) that does the saving:
import pickle
import hashlib
import os
class cacheHelper():
def __init__(self, fileName, attr=[], folder='../cache/'):
self.folder = folder
if len(attr) > 0:
attr = self.attrToName(attr)
else:
attr = ''
self.fileNameNaked = fileName
self.fileName = fileName + attr
def write(self, objects):
with open(self.getFile(), 'wb') as output:
for object in objects:
pickle.dump(object, output, pickle.HIGHEST_PROTOCOL)
when it gets to pickle.dump()
, it raises an Exception which is hard to debug because the debugger wont step into the worker that actually faced that exception. Therefore I created a breakpoint right before the dumping happened, and manually entered that command. Here is the output:
>>> pickle.dump(objects[0], output, pickle.HIGHEST_PROTOCOL)
Traceback (most recent call last):
File "/usr/local/anaconda2/envs/myenv2/lib/python2.7/site-packages/IPython/core/interactiveshell.py", line 2885, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-1-4d2cbb7c63d1>", line 1, in <module>
pickle.dump(objects[0], output, pickle.HIGHEST_PROTOCOL)
File "/usr/local/anaconda2/envs/myenv2/lib/python2.7/pickle.py", line 1376, in dump
Pickler(file, protocol).dump(obj)
File "/usr/local/anaconda2/envs/myenv2/lib/python2.7/pickle.py", line 224, in dump
self.save(obj)
File "/usr/local/anaconda2/envs/myenv2/lib/python2.7/pickle.py", line 331, in save
self.save_reduce(obj=obj, *rv)
File "/usr/local/anaconda2/envs/myenv2/lib/python2.7/pickle.py", line 396, in save_reduce
save(cls)
File "/usr/local/anaconda2/envs/myenv2/lib/python2.7/pickle.py", line 286, in save
f(self, obj) # Call unbound method with explicit self
File "/usr/local/anaconda2/envs/myenv2/lib/python2.7/site-packages/dill/dill.py", line 1203, in save_type
StockPickler.save_global(pickler, obj)
File "/usr/local/anaconda2/envs/myenv2/lib/python2.7/pickle.py", line 754, in save_global
(obj, module, name))
PicklingError: Can't pickle <class '__main__.Parameters'>: it's not found as __main__.Parameters
The odd thing is that this doesn't happen when I don't parallelize, i.e. loop through helperFunction
manually. I'm pretty sure that I'm opening the right Parameters
(and not the parent class).
I know it is tough to debug things without a reproducible example, I don't expect any solutions on this part. Perhaps the more general question is:
pickle.dump()
via another module?Upvotes: 2
Views: 542
Reputation: 10513
Straight from the Python docs.
12.1.4. What can be pickled and unpickled? The following types can be pickled:
- None, True, and False
- integers, floating point numbers, complex
- strings, bytes, bytearrays
- tuples, lists, sets, and
- dictionaries containing only picklable objects functions defined at the top level of a module (using def, not lambda)
- built-in functions defined at the top level of a module
- classes that are defined at the top level of a module
- instances of such classes whose
__dict__
or the result of calling__getstate__()
is picklable (see section Pickling Class Instances for details).
Everything else can't be pickled. In your case, though it's very hard to say given the excerpt of your code, I believe the problem is that the class Parameters
is not defined at the top level of the module, hence its instances can't be pickled.
The whole point of using pathos.multiprocessing
(or its actively developing fork multiprocess
) instead of the built-in multiprocessing
is to avoid pickle
, because there are far too many things the later can't dump. pathos.multiprocessing
and multiprocess
use dill
instead of pickle
. And if you want to debug a worker, you can use trace.
NOTE As Mike McKerns (the main contributor of multiprocess
) rightfully noticed, there are cases that even dill
can't handle, though it will be hard to formulate some universal rules on that matter.
Upvotes: 2