Reputation: 8837
I have a very large python shelve object (6GB on disk). I want to be able to move it to another machine, and since shelves are not portable, I wanted to cPickle it. To do that, I first have to convert it to a dict.
For some reason, when I do dict(myShelf)
the ipython process spikes up to 32GB of memory (all my machine has) and then seems to hang (or maybe just take a really long time).
Can someone explain this? And perhaps offer a potential workaround?
edit: using Python 2.7
Upvotes: 4
Views: 2136
Reputation: 279285
From my experience I'd expect pickling to be even more of a memory-hog than what you've done so far. However, creating a dict
loads every key and value in the shelf into memory at once, and you shouldn't assume because your shelf is 6GB on disk, that it's only 6GB in memory. For example:
>>> import sys, pickle
>>> sys.getsizeof(1)
24
>>> len(pickle.dumps(1))
4
>>> len(pickle.dumps(1, -1))
5
So, a very small integer is 5-6 times bigger as a Python int
object (on my machine) than it is once pickled.
As for the workaround: you can write more than one pickled object to a file. So don't convert the shelf to a dict
, just write a long sequence of keys and values to your file, then read an equally long sequence of keys and values on the other side to put into your new shelf. That way you only need one key/value pair in memory at a time. Something like this:
Write:
with open('myshelf.pkl', 'wb') as outfile:
pickle.dump(len(myShelf), outfile)
for p in myShelf.iteritems():
pickle.dump(p, outfile)
Read:
with open('myshelf.pkl', 'rb') as infile:
for _ in xrange(pickle.load(infile)):
k, v = pickle.load(infile)
myShelf[k] = v
I think you don't actually need to store the length, you could just keep reading until pickle.load
throws an exception indicating it's run out of file.
Upvotes: 5