Basj
Basj

Reputation: 46423

Numpy savez / load thousands of arrays, but not in one step

I would like to store approx 4000 numpy arrays (of 1.5 MB each) in a serialized uncompressed file (approx 6 GB of data). Here is an example with 2 small arrays :

import numpy
d1 = { 'array1' : numpy.array([1,2,3,4]), 'array2': numpy.array([5,4,3,2]) }
numpy.savez('myarrays', **d1)

d2 = numpy.load('myarrays.npz')
for k in d2:
    print d2[k]

It works, but I would like to do the same thing not in a single step :

How to do with this numpy.savez / numpy.load ?

Upvotes: 4

Views: 3767

Answers (2)

JoshAdel
JoshAdel

Reputation: 68682

I don't think you can do this with np.savez. This, however, is the perfect use-case for hdf5. See either:

http://www.h5py.org

or

http://www.pytables.org

As an example of how to do this in h5py:

h5f = h5py.File('test.h5', 'w')
h5f.create_dataset('array1', data=np.array([1,2,3,4]))
h5f.create_dataset('array2', data=np.array([5,4,3,2]))
h5f.close()

# Now open it back up and read data
h5f = h5py.File('test.h5', 'r')
a = h5f['array1'][:] 
b = h5f['array2'][:]
h5f.close()
print a
print b
# [1 2 3 4]
# [5 4 3 2]

And of course there are more sophisticated ways of doing this, organizing arrays via groups, adding metadata, pre-allocating space in the hdf5 file and then filling it later, etc.

Upvotes: 8

w-m
w-m

Reputation: 11232

savez in the current numpy is just writing the arrays to temp files with numpy.save, and then adds them to a zip file (with or without compression).

If you're not using compression, you might as well skip step 2 and just save your arrays one by one, and keep all 4000 of them in a single folder.

Upvotes: 1

Related Questions