Reputation: 396
I would like to write Numpy arrays with shape (3, 225, 400) into a binary file.
These arrays are basically generated by using a screen buffer, and each screen has a label. My goal is to save each screen with its label.
numpy.save receives only two arguments: file pointer and array to be saved. The only option seems to be appending labels to arrays as follows:
with open(file, 'wb') as f:
np.save(f, np.append(buffer, [label]) )
However, I would not prefer this. Another approach might be saving only the array and then writing " \t label " like regular binary writing:
with open(file, 'wb') as f:
np.save(f, buffer)
f.write("\t" + label)
I am not sure whether np.save moves the file pointer to new line after saving.
Considering the fact that I will save hundreds of thousands of array-label pairs in a high frequency, what would you suggest in terms of efficiency?
Upvotes: 2
Views: 1959
Reputation: 21947
If you have a dict like
mydict = { "label0" : array0, "label1" : array1 ... }
just
save = np.savez( "my.npz", **mydict )
# == np.savez( "my.npz", label0=array0, label1=array1 ... )
load = np.load( "my.npz" ) # like `mydict`
print( "my.npz labels:" )
print( "\n".join( load.keys() )
array0 = load["label0"]
...
Notes:
Don't compress; do pay attention to the array formats, e.g. np.uint8.
Always add mydict["runinfo"]
= "who what when".
For a summary of xx.npz
, see the little gist npzinfo
.
np.load( ... mmap_mode ) ?
Upvotes: 0
Reputation: 19250
One option is to save to a numpy (NPZ) file. I have included an example below. np.savez
and np.savez_compressed
allow one to save multiple arrays to one file.
import numpy as np
# Create fake data.
rng = np.random.RandomState(0)
buffer = rng.normal(size=(3, 225, 400))
label = "this is the label"
# Save. Can use np.savez here instead.
np.savez_compressed("output.npz", buffer=buffer, label=label)
# Load.
npzfile = np.load("output.npz")
np.testing.assert_equal(npzfile["buffer"], buffer)
np.testing.assert_equal(npzfile["label"], label)
Another option is to use HDF5 using h5py
. The organization of an HDF5 file is similar to a filesystem (root is /
and datasets can be created with names like /data/buffers/dataset1
). One way of organizing the buffers and labels is to create a dataset for each buffer and set a label attribute on it.
import h5py
import numpy as np
# Create fake data.
rng = np.random.RandomState(0)
buffer = rng.normal(size=(3, 225, 400))
label = "this is the label"
this_dataset = "/buffers/0"
# Save to HDF5.
with h5py.File("output.h5", "w") as f:
f.create_dataset(this_dataset, data=buffer, compression="lzf")
f[this_dataset].attrs.create("label", label)
# Load.
with h5py.File("output.h5", "r") as f:
loaded_buffer = f[this_dataset]
loaded_label = f[this_dataset].attrs["label"]
Upvotes: 2