Monkone
Monkone

Reputation: 51

Accessing datasets in HDF5 files containing unusual datatypes

I want to analyse some datasets in a .h5 file I have been given using something like Pandas however I have encountered an issue that appears to be a result of some datatypes in the datasets not being supported by the tools I've used so far. I've done a little reading around but I don't understand the tools and problem well enough to troubleshoot effectively and am seeking guidance on the matter.

I can access the datasets using HDFView however using this to export all the data to text files and then reading those text files into something else isn't ideal and furthermore doesn't always print out the data in a format I know how to use (i.e. booleans print out as something like B@38b27cdc with every such expression appearing to be unique). I have tried some attempts at accessing the file using python (h5py and PyTables) and MATLAB. Some code examples and outputs are given below.

h5py example using numpy

f = h5py.File('filename.h5', 'r')
group = f["Run 1"]
dataset = group["datasetIAmInterestedIn"]

Error output end summary. Let me know if you want to see the full output here.

TypeError: No NumPy equivalent for TypeBitfieldID exists

PyTable example

f = tab.File('filename.h5', 'r')
group = f.get_node("/Run 1")
group.datasetIAmInterestedIn

Output from command

/Run 1/datasetIAmInterestedIn(UnImplemented(58023,)) ''
NOTE: The UnImplemented object represents a PyTables unimplemented dataset present in the 'filename.h5' HDF5 file.  If you want to see this kind of HDF5 dataset implemented in PyTables, please contact the developers.

(I've included this in case it is helpful)

MATLAB

data = hdf5read("filename.h5","/Run 1/datasetIAmInterestedIn")

Output from command

Error using hdf5readc
Call to HDF5 library failed (unsupportedDatatype): "Datatype of an attribute value is not supported. Please disable the
reading of attribute values by setting the 'ReadAttributes' argument
to false. Type HELP HDF5INFO for more information.".

Error in hdf5read (line 100)
[data, attributes] = hdf5readc(settings.filename, ...

My attempts in exploring usage of ReadAttributes have not yielded any useful information or results.

Upvotes: 0

Views: 738

Answers (1)

Jetpac
Jetpac

Reputation: 2115

If you have a compound data-type you'll need to define the compound before you can read it. For example (and here the code is in MATLAB) you have a compound datatype that stores logging entries with strings (of different sizes), a double-precision float that stores a datetime and an integer ID:

char32_type = H5T.copy('H5T_FORTRAN_S1');
H5T.set_size(char32_type, 32);
char32_size = H5T.get_size(char32_type);

char64_type = H5T.copy('H5T_FORTRAN_S1');
H5T.set_size(char64_type, 64);
char64_size = H5T.get_size(char64_type);

int_type = H5T.copy('H5T_NATIVE_INT');
int_size = H5T.get_size(int_type);

dbl_type = H5T.copy('H5T_NATIVE_DOUBLE');
dbl_size = H5T.get_size(dbl_type);

sizes = [
    char32_size
    dbl_size
    char64_size
    int_size
    ];

offset = [ 0 ; cumsum(sizes) ]; % zero-based

log_type = H5T.create('H5T_COMPOUND', sum(sizes));
H5T.insert(log_type, 'user',        offset(1), char32_type);
H5T.insert(log_type, 'datetime',    offset(2), dbl_type);
H5T.insert(log_type, 'action',      offset(3), char64_type);
H5T.insert(log_type, 'id',          offset(4), int_type);

You can use this compound type log_type to read/write the data. E.g. for reading a dataset /log in the HDF5 file then something like the following:

plist     = 'H5P_DEFAULT';
fid       = H5F.open(log_file ,'H5F_ACC_RDONLY', plist);
log_dset  = H5D.open(fid, '/log');
mem_space = 'H5S_ALL';
log = H5D.read(log_dset, log_type, mem_space, 'H5S_ALL', 'H5P_DEFAULT');


% Close the dataset and file once you're done
H5D.close(log_dset);
H5F.close(fid);

The output of the read (log) will be a struct. The strings will be char arrays in inconvenient column-format, so will need to be transposed. Here I convert them to cell-arrays but you might want the String class or otherwise. I convert the datetime to an array of dates in string format. The id field is fine, and doesn't need fettling.

log.user    = cellstr(log.user');
log.action  = cellstr(log.action');
log.datestr = cellstr(datestr(log.datetime));
% log.id is okay as is

If you don't know the format of the compound type, then you could be in trouble?

Upvotes: 1

Related Questions