equaeghe
equaeghe

Reputation: 1784

h5py: Compound datatypes and scale-offset in the compression pipeline

Using Numpy and h5py, it is possible to create ‘compound datatype’ datasets to be stored in an hdf5-file:

import h5py
import numpy as np
#
# Create a new file using default properties.
#
file = h5py.File('compound.h5','w')
#
# Create a dataset under the Root group.
#
comp_type = np.dtype([('fieldA', 'i4'), ('fieldB', 'f4')])
dataset = file.create_dataset("comp", (4,), comp_type)

It is also possible to use various compression filters in a ‘compression pipeline’, among them the ‘scale-offset’ filter:

cmpr_dataset = file.create_dataset("cmpr", (4,), 'i4', scaleoffset=0)

However, it is not clear to me whether and then how it is possible to specify the scale offset filter with specific parameter (e.g., the 0 in the above example) for the different fields of a compound datatype.

More generally, it is not clear to me whether and how any filter can be applied with field-specific parameters.

So, the question are:

My guess (fear) is that the nature of how the compound data is stored (in one ‘column’, instead of each field in its own ‘column’) will prohibit application of such field-specific filters, but I wanted to check, just to be sure.

Upvotes: 1

Views: 793

Answers (1)

hpaulj
hpaulj

Reputation: 231385

Besides the h5py docs, look at the hdf5 docs. They go into more detail. If the underlying file system does not support this, then the numpy interface won't either.

https://support.hdfgroup.org/HDF5/doc/UG/OldHtmlSource/10_Datasets.html#ScaleOffset

Elsewhere it says filters are applied to whole chunks.

The expression defining the compound type is pure numpy. h5py must be translating its descriptor into an equivalent hdf5 c-struc description. There are sample c and fortran compound types definitions.

All docs say that this offset applies only to integer and float types. That can be understood as excluding string, vlen, and compound. What you are hoping is that it would still work with the numeric types inside a compound type. I don't think so.

Upvotes: 1

Related Questions