Parand
Parand

Reputation: 106310

Python: Incrementally marshal / pickle an object?

I have a large object I'd like to serialize to disk. I'm finding marshal works quite well and is nice and fast.

Right now I'm creating my large object then calling marshal.dump . I'd like to avoid holding the large object in memory if possible - I'd like to dump it incrementally as I build it. Is that possible?

The object is fairly simple, a dictionary of arrays.

Upvotes: 4

Views: 2883

Answers (4)

Theran
Theran

Reputation: 3856

It all your object has to do is be a dictionary of lists, then you may be able to use the shelve module. It presents a dictionary-like interface where the keys and values are stored in a database file instead of in memory. One limitation which may or may not affect you is that keys in Shelf objects must be strings. Value storage will be more efficient if you specify protocol=-1 when creating the Shelf object to have it use a more efficient binary representation.

Upvotes: 4

elo80ka
elo80ka

Reputation: 15825

The bsddb module's 'hashopen' and 'btopen' functions provide a persistent dictionary-like interface. Perhaps you could use one of these, instead of a regular dictionary, to incrementally serialize the arrays to disk?

import bsddb
import marshal

db = bsddb.hashopen('file.db')
db['array1'] = marshal.dumps(array1)
db['array2'] = marshal.dumps(array2)
...
db.close()

To retrieve the arrays:

db = bsddb.hashopen('file.db')
array1 = marshal.loads(db['array1'])
...

Upvotes: 4

David Berger
David Berger

Reputation: 12803

You should be able to dump the item piece by piece to the file. The two design questions that need settling are:

  1. How are you building the object when you're putting it in memory?
  2. How do you need you're data when it comes out of memory?

If your build process populates the entire array associated with a given key at a time, you might just dump the key:array pair in a file as a separate dictionary:

big_hairy_dictionary['sample_key'] = pre_existing_array
marshal.dump({'sample_key':big_hairy_dictionary['sample_key']},'central_file')

Then on update, each call to marshal.load('central_file') will return a dictionary that you can use to update a central dictionary. But this is really only going to be helpful if, when you need the data back, you want to handle reading 'central_file' once per key.

Alternately, if you are populating arrays element by element in no particular order, maybe try:

big_hairy_dictionary['sample_key'].append(single_element)
marshal.dump(single_element,'marshaled_files/'+'sample_key')

Then, when you load it back, you don't necessarily need to build the entire dictionary to get back what you need; you just call marshal.load('marshaled_files/sample_key') until it returns None, and you have everything associated with the key.

Upvotes: 0

thebigjc
thebigjc

Reputation: 569

This very much depends on how you are building the object. Is it an array of sub objects? You could marshal/pickle each array element as you build it. Is it a dictionary? Same idea applies (marshal/pickle keys)

If it is just a big complex harry object, you might want to marshal dump each piece of the object, and then the apply what ever your 'building' process is when you read it back in.

Upvotes: 0

Related Questions