Framester
Framester

Reputation: 35501

How to save big (not huge) dictonaries in Python?

My dictionary will consist of several thousand keys which each key having a 1000x1000 numpy array as value. I don't need the file to be human readable. Small size and fast loading times are more important.

First I tried savemat, but I ran into problems. Pickle resulted in a huge file. I assume the same for csv. I've read posts recommending using json (readable text probably huge) or db (assumingly complicated). What would you recommend for my case?

Upvotes: 4

Views: 1490

Answers (5)

HYRY
HYRY

Reputation: 97321

You can use PyTables (http://www.pytables.org/moin) , and save your data in HDF5 format.

Upvotes: 0

Ivo
Ivo

Reputation: 5420

Google's Protobuf specification is designed to be extremely efficient on overhead. I'm not sure how fast at (de)serializing it is, but being Google, I imagine it's not shabby.

Upvotes: 0

tkf
tkf

Reputation: 3020

How about numpy.savez? It can save multiple numpy array and they are binary so it should be faster than pickle.

Upvotes: 2

jterrace
jterrace

Reputation: 67083

If you have a dictionary where the keys are strings and the values are arrays, like this:

>>> import numpy
>>> arrs = {'a': numpy.array([1,2]),
            'b': numpy.array([3,4]),
            'c': numpy.array([5,6])}

You can use numpy.savez to save them, by key, to a compressed file:

>>> numpy.savez('file.npz', **arrs)

To load it back:

>>> npzfile = numpy.load('file.npz')
>>> npzfile
<numpy.lib.npyio.NpzFile object at 0x1fa7610>
>>> npzfile['a']
array([1, 2])
>>> npzfile['b']
array([3, 4])
>>> npzfile['c']
array([5, 6])

Upvotes: 6

Greg Hewgill
Greg Hewgill

Reputation: 993611

The filesystem itself is often an underappreciated data structure. You could have a dictionary that is a map from your keys to filenames, and then each file has the 1000x1000 array in it. Pickling the dictionary would be quick and easy, and then the data files can just contain raw data (which numpy can easily load).

Upvotes: 3

Related Questions