O.rka
O.rka

Reputation: 30687

Pickle a dictionary of class instances in Python

How can I pickle a dictionary object that contains instances of a Class in one file (Python File 1) and pickle.load in another file (Python File 2)?

I have a HUGE complicated dataset made up of several files and I created a class to store all of my attributes. I made a dictionary to store all of the samples and attributes . key = sample, value = instance of the class containing the atributes. Example below:

#Python File 1
import random

class Storage:
    def __init__(self,label,x,y): 
        self.label = label; self.x = x; self.y = y
    def get_x(self): return(self.x)
    def get_y(self): return(self.y)

D_var_instance = {}
L = ["A","B","C"]

for var in L: 
    D_var_instance[var] = Storage(label=var,x=random.random(),y=random.random())

print(D_var_instance["A"])
#<__main__.Storage instance at 0x102811128>

print(D_var_instance["A"].get_x())
#0.193517721574

It takes me a long time to make this with my real dataset, I tried using pickle and pickle.dump the dictionary object but it's not working:

#Python File 1
import pickle
pickle.dump(D_var_instance,open("/path/to/dump.txt","w"))
pickle.dump(Storage, open("/path/to/storagedump.txt","w"))

I tried loading in another Python file with this code:

#Python File 2
import pickle
Storage = pickle.load(open("/path/to/storagedump.txt","r"))
D_var_instance = pickle.load(open("/path/to/dump.txt","r"))

Got this error:

AttributeError: 'module' object has no attribute 'Storage'

Upvotes: 1

Views: 3114

Answers (2)

idjaw
idjaw

Reputation: 26570

The problem here can be perfectly explained via this SO post right here

Ultimately, what is happening here is that when you are pickling your instances, you have to be able to reference your module appropriately with respect to where you pickled it from.

So, to show some code to illustrate this. You can do this (explanation to follow):

storage.py

class Storage(object):
    pass

foo.py

import pickle
from storage import Storage

D_var_instance = {}
L = ["A","B","C"]

for var in L: 
    D_var_instance[var] = Storage(label=var,x=random.random(),y=random.random())

pickle.dump(D_var_instance, open("/path/pickle.txt", "wb"))

boo.py

D_var_instance = pickle.load(open("/path/pickle.txt", "rb"))

So, when you wrote your pickle, from foo, your reference will be storage.Storage now. When you go in to an entirely different module (boo.py) and try to unpickle, what is happening here is that you are trying to load something with reference to a module that won't work from where you are doing it from.

The way to solve this can be done in different ways now. Since I structured everything in the same level, you actually don't need to import anything and it should work!

However, if you happen to have your class and pickle writing in the same module, like you did, then you will have to import the module that houses that code in boo.py

I suggest you look at the two options provided in the SO post I linked to see which one satisfies you. But that should be your solution.

Running this script from iPython yields:

ipython boo.py
{'A': <storage.Storage instance at 0x1107b77e8>, 'C': <storage.Storage instance at 0x1107b7680>, 'B': <storage.Storage instance at 0x1107b7908>}

Upvotes: 1

Mike McKerns
Mike McKerns

Reputation: 35217

You can make it easy on yourself by using dill instead of pickle. dill pickles class definitions along with class instances (instead of by reference, like pickle does). So, you don't need to do anything different other than import dill as pickle.

To simulate working in another file, I'll build a class, some class instances in a dict, then delete everything but the pickled string. You can reconstitute from there.

>>> class Foo(object):
...   def __init__(self, x):
...     self.x = x
... 
>>> d = dict(f=Foo(1), g=Foo(2), h=Foo(3))
>>> 
>>> import dill
>>> _stored_ = dill.dumps(d)
>>>        
>>> del Foo
>>> del d
>>> 
>>> d = dill.loads(_stored_)
>>> d['f'].x
1
>>> d['g'].x
2
>>> d['h'].x
3
>>> dill.dump_session()

I finish with a dump_session, to pickle everything in the interpreter to a file. Then, in a new python session (potentially on a different machine), you can start up where you left off.

>>> import dill
>>> dill.load_session()
>>> d
{'h': <__main__.Foo object at 0x110c6cfd0>, 'g': <__main__.Foo object at 0x10fbce410>, 'f': <__main__.Foo object at 0x110c6b050>}
>>> 

If you are looking for the traditional dump and load, that works too. It also works with ipython.

Upvotes: 3

Related Questions