Seek into a file full of pickled objects

Question

I have this huge file with objects pickled in, lets assume:

for object in objects:
   pickle.dump(myfile,object)

The objects are of different size although they are the same type.
The file gets filled for a long time on different occasions, but from time to time, when the dumping process gets restarted, I would need to read the last objects.
Something like this:

 myfile.seek(-1000,2)
 while myfile.tell() < mysize:
    objects.append(pickle.load(myfile))

Now, this obviously doesn't work because -1000 isn't normally at the start of one of the objects and pickle raises an exception etc...
While I could just try except:pass and let pickle fail till it finds something pickable, I don't really like the idea and I suspect it does advance the file too much on certain read tries and I could be missing few objects.

Reading the file from the beginning is not an option because of its size.

Any ideas for this to work? Is there any way for pickle to check if the current file cursor points to something that looks like an object or not?

Vinay Sajip · Accepted Answer

One way is to do something like this:

import os, pickle, struct

myfile = open('/path/to/my/file', 'w+b')
myfile.write(struct.pack('L', 0)) # write a long of zeroes
index = []
for o in objects:
    index.append(myfile.tell())
    pickle.dump(o, myfile)
index_loc = myfile.tell()
pickle.dump(index, myfile)
myfile.seek(0, 0,  os.SEEK_SET)
myfile.write(struct.pack('L', index_loc))

Now you have an indexed file: when re-opening, read the index location from the initial bytes, then seek to that location and read the index. You should then be able to access any object in the file in a random-access manner. (Of course, you can generalise this by having the index be a dict of object key to file location - a sort of poor man's ZODB).

Or, of course, you could use the shelve module.

Seek into a file full of pickled objects

Answers (2)

Related Questions