Reputation: 46423
I would like to store approx 4000 numpy arrays (of 1.5 MB each) in a serialized uncompressed file (approx 6 GB of data). Here is an example with 2 small arrays :
import numpy
d1 = { 'array1' : numpy.array([1,2,3,4]), 'array2': numpy.array([5,4,3,2]) }
numpy.savez('myarrays', **d1)
d2 = numpy.load('myarrays.npz')
for k in d2:
print d2[k]
It works, but I would like to do the same thing not in a single step :
When saving, I would like to be able to save 10 arrays, then do some other task (than may use some seconds), then write 100 other arrays, then do something else, then write some other 50 arrays, etc.
When loading : idem, I would like to be able to load some arrays, then do some other task, then continue the loading.
How to do with this numpy.savez
/ numpy.load
?
Upvotes: 4
Views: 3767
Reputation: 68682
I don't think you can do this with np.savez. This, however, is the perfect use-case for hdf5. See either:
or
As an example of how to do this in h5py:
h5f = h5py.File('test.h5', 'w')
h5f.create_dataset('array1', data=np.array([1,2,3,4]))
h5f.create_dataset('array2', data=np.array([5,4,3,2]))
h5f.close()
# Now open it back up and read data
h5f = h5py.File('test.h5', 'r')
a = h5f['array1'][:]
b = h5f['array2'][:]
h5f.close()
print a
print b
# [1 2 3 4]
# [5 4 3 2]
And of course there are more sophisticated ways of doing this, organizing arrays via groups, adding metadata, pre-allocating space in the hdf5 file and then filling it later, etc.
Upvotes: 8
Reputation: 11232
savez
in the current numpy is just writing the arrays to temp files with numpy.save
, and then adds them to a zip file (with or without compression).
If you're not using compression, you might as well skip step 2 and just save your arrays one by one, and keep all 4000 of them in a single folder.
Upvotes: 1