How to speed up dill serialization to store Python object to file

Question

It says in the documentation that the output of sys.getsizeof() is in bytes. I'm trying to store a data structure that is a dictionary of class instances and lists. I did sys.getsizeof() on this dictionary of class instances and it was 3352 bytes. I'm serializing it using dill so I could load it later but it's taking a really, really long time.

The file size is already 260 MB which is much larger than 3352 bytes specified by sys.getsizeof(). Does anyone know why the values are different and why it is taking so long to store?

Is there a more efficient way to store objects like this when running on a 4GB memory Mac Air?

It's an incredible tool . I'm not sure if there is any parameters I can tweak to help with my low memory issue. I know there's a protocol=2 for pickle but it doesn't seem to store the environment as well as dill.

sys.getsizeof(D_storage_Data) #Output is 3352
dill.dump(D_storage_Data,open("storage.obj","wb"))

Mike McKerns · Accepted Answer

I'm the dill author. See my comment here: If Dill file is too large for RAM is there another way it can be loaded. In short, the answer is that it depends on what you are pickling… and if it's class instances, the answer is yes. Try the byref setting. Also if you are looking to store a dict of objects, you might want to map your dict to a directory of files, by using klepto -- that way you can dump and load individual elements of the dict individually, and still work out of a dict API.

So especially when using dill, and especially in a ipynb, check out dill.settings... Serialization (dill, pickle, or otherwise) recursively pulls objects into the pickle, and so often can pull in all of globals. Use dill.settings to change what is stored by reference and what is stored by pickling.

How to speed up dill serialization to store Python object to file

Answers (2)

Related Questions