Numpy savez / load thousands of arrays, but not in one step

Question

I would like to store approx 4000 numpy arrays (of 1.5 MB each) in a serialized uncompressed file (approx 6 GB of data). Here is an example with 2 small arrays :

import numpy
d1 = { 'array1' : numpy.array([1,2,3,4]), 'array2': numpy.array([5,4,3,2]) }
numpy.savez('myarrays', **d1)

d2 = numpy.load('myarrays.npz')
for k in d2:
    print d2[k]

It works, but I would like to do the same thing not in a single step :

When saving, I would like to be able to save 10 arrays, then do some other task (than may use some seconds), then write 100 other arrays, then do something else, then write some other 50 arrays, etc.
When loading : idem, I would like to be able to load some arrays, then do some other task, then continue the loading.

How to do with this numpy.savez / numpy.load ?

JoshAdel · Accepted Answer

I don't think you can do this with np.savez. This, however, is the perfect use-case for hdf5. See either:

http://www.h5py.org

or

http://www.pytables.org

As an example of how to do this in h5py:

h5f = h5py.File('test.h5', 'w')
h5f.create_dataset('array1', data=np.array([1,2,3,4]))
h5f.create_dataset('array2', data=np.array([5,4,3,2]))
h5f.close()

# Now open it back up and read data
h5f = h5py.File('test.h5', 'r')
a = h5f['array1'][:] 
b = h5f['array2'][:]
h5f.close()
print a
print b
# [1 2 3 4]
# [5 4 3 2]

And of course there are more sophisticated ways of doing this, organizing arrays via groups, adding metadata, pre-allocating space in the hdf5 file and then filling it later, etc.

Numpy savez / load thousands of arrays, but not in one step

Answers (2)

Related Questions