Reputation: 30687
How can I pickle
a dictionary object that contains instances of a Class in one file (Python File 1) and pickle.load
in another file (Python File 2)?
I have a HUGE complicated dataset made up of several files and I created a class to store all of my attributes. I made a dictionary to store all of the samples and attributes . key = sample, value = instance of the class containing the atributes. Example below:
#Python File 1
import random
class Storage:
def __init__(self,label,x,y):
self.label = label; self.x = x; self.y = y
def get_x(self): return(self.x)
def get_y(self): return(self.y)
D_var_instance = {}
L = ["A","B","C"]
for var in L:
D_var_instance[var] = Storage(label=var,x=random.random(),y=random.random())
print(D_var_instance["A"])
#<__main__.Storage instance at 0x102811128>
print(D_var_instance["A"].get_x())
#0.193517721574
It takes me a long time to make this with my real dataset, I tried using pickle
and pickle.dump
the dictionary object but it's not working:
#Python File 1
import pickle
pickle.dump(D_var_instance,open("/path/to/dump.txt","w"))
pickle.dump(Storage, open("/path/to/storagedump.txt","w"))
I tried loading in another Python file with this code:
#Python File 2
import pickle
Storage = pickle.load(open("/path/to/storagedump.txt","r"))
D_var_instance = pickle.load(open("/path/to/dump.txt","r"))
Got this error:
AttributeError: 'module' object has no attribute 'Storage'
Upvotes: 1
Views: 3114
Reputation: 26570
The problem here can be perfectly explained via this SO post right here
Ultimately, what is happening here is that when you are pickling your instances, you have to be able to reference your module appropriately with respect to where you pickled it from.
So, to show some code to illustrate this. You can do this (explanation to follow):
storage.py
class Storage(object):
pass
foo.py
import pickle
from storage import Storage
D_var_instance = {}
L = ["A","B","C"]
for var in L:
D_var_instance[var] = Storage(label=var,x=random.random(),y=random.random())
pickle.dump(D_var_instance, open("/path/pickle.txt", "wb"))
boo.py
D_var_instance = pickle.load(open("/path/pickle.txt", "rb"))
So, when you wrote your pickle, from foo, your reference will be storage.Storage
now. When you go in to an entirely different module (boo.py) and try to unpickle, what is happening here is that you are trying to load something with reference to a module that won't work from where you are doing it from.
The way to solve this can be done in different ways now. Since I structured everything in the same level, you actually don't need to import anything and it should work!
However, if you happen to have your class and pickle writing in the same module, like you did, then you will have to import the module that houses that code in boo.py
I suggest you look at the two options provided in the SO post I linked to see which one satisfies you. But that should be your solution.
Running this script from iPython yields:
ipython boo.py
{'A': <storage.Storage instance at 0x1107b77e8>, 'C': <storage.Storage instance at 0x1107b7680>, 'B': <storage.Storage instance at 0x1107b7908>}
Upvotes: 1
Reputation: 35217
You can make it easy on yourself by using dill
instead of pickle
. dill
pickles class definitions along with class instances (instead of by reference, like pickle
does). So, you don't need to do anything different other than import dill as pickle
.
To simulate working in another file, I'll build a class, some class instances in a dict, then delete everything but the pickled string. You can reconstitute from there.
>>> class Foo(object):
... def __init__(self, x):
... self.x = x
...
>>> d = dict(f=Foo(1), g=Foo(2), h=Foo(3))
>>>
>>> import dill
>>> _stored_ = dill.dumps(d)
>>>
>>> del Foo
>>> del d
>>>
>>> d = dill.loads(_stored_)
>>> d['f'].x
1
>>> d['g'].x
2
>>> d['h'].x
3
>>> dill.dump_session()
I finish with a dump_session
, to pickle everything in the interpreter to a file. Then, in a new python session (potentially on a different machine), you can start up where you left off.
>>> import dill
>>> dill.load_session()
>>> d
{'h': <__main__.Foo object at 0x110c6cfd0>, 'g': <__main__.Foo object at 0x10fbce410>, 'f': <__main__.Foo object at 0x110c6b050>}
>>>
If you are looking for the traditional dump
and load
, that works too. It also works with ipython
.
Upvotes: 3