bbasaran
bbasaran

Reputation: 396

numpy.save to store 3D Numpy array together with a label

I would like to write Numpy arrays with shape (3, 225, 400) into a binary file.

These arrays are basically generated by using a screen buffer, and each screen has a label. My goal is to save each screen with its label.

numpy.save receives only two arguments: file pointer and array to be saved. The only option seems to be appending labels to arrays as follows:

with open(file, 'wb') as f:
   np.save(f, np.append(buffer, [label]) )

However, I would not prefer this. Another approach might be saving only the array and then writing " \t label " like regular binary writing:

with open(file, 'wb') as f:
   np.save(f, buffer)
   f.write("\t" + label)

I am not sure whether np.save moves the file pointer to new line after saving.

Considering the fact that I will save hundreds of thousands of array-label pairs in a high frequency, what would you suggest in terms of efficiency?

Upvotes: 2

Views: 1959

Answers (2)

denis
denis

Reputation: 21947

If you have a dict like

mydict = { "label0" : array0, "label1" : array1 ... }

just

save = np.savez( "my.npz", **mydict )
    # == np.savez( "my.npz", label0=array0, label1=array1 ... )

load = np.load( "my.npz" )  # like `mydict`
print( "my.npz labels:" )
print( "\n".join( load.keys() )
array0 = load["label0"]
...

Notes:
Don't compress; do pay attention to the array formats, e.g. np.uint8.
Always add mydict["runinfo"] = "who what when".
For a summary of xx.npz, see the little gist npzinfo.
np.load( ... mmap_mode ) ?

Upvotes: 0

jkr
jkr

Reputation: 19250

One option is to save to a numpy (NPZ) file. I have included an example below. np.savez and np.savez_compressed allow one to save multiple arrays to one file.

import numpy as np

# Create fake data.
rng = np.random.RandomState(0)
buffer = rng.normal(size=(3, 225, 400))
label = "this is the label"

# Save. Can use np.savez here instead.
np.savez_compressed("output.npz", buffer=buffer, label=label)

# Load.
npzfile = np.load("output.npz")

np.testing.assert_equal(npzfile["buffer"], buffer)
np.testing.assert_equal(npzfile["label"], label)

Another option is to use HDF5 using h5py. The organization of an HDF5 file is similar to a filesystem (root is / and datasets can be created with names like /data/buffers/dataset1). One way of organizing the buffers and labels is to create a dataset for each buffer and set a label attribute on it.

import h5py
import numpy as np

# Create fake data.
rng = np.random.RandomState(0)
buffer = rng.normal(size=(3, 225, 400))
label = "this is the label"

this_dataset = "/buffers/0"

# Save to HDF5.
with h5py.File("output.h5", "w") as f:
    f.create_dataset(this_dataset, data=buffer, compression="lzf")
    f[this_dataset].attrs.create("label", label)

# Load.
with h5py.File("output.h5", "r") as f:
    loaded_buffer = f[this_dataset]
    loaded_label = f[this_dataset].attrs["label"]

Upvotes: 2

Related Questions