Reputation: 51
I want to analyse some datasets in a .h5 file I have been given using something like Pandas however I have encountered an issue that appears to be a result of some datatypes in the datasets not being supported by the tools I've used so far. I've done a little reading around but I don't understand the tools and problem well enough to troubleshoot effectively and am seeking guidance on the matter.
I can access the datasets using HDFView
however using this to export all the data to text files and then reading those text files into something else isn't ideal and furthermore doesn't always print out the data in a format I know how to use (i.e. booleans print out as something like B@38b27cdc
with every such expression appearing to be unique).
I have tried some attempts at accessing the file using python (h5py and PyTables) and MATLAB. Some code examples and outputs are given below.
h5py example using numpy
f = h5py.File('filename.h5', 'r')
group = f["Run 1"]
dataset = group["datasetIAmInterestedIn"]
Error output end summary. Let me know if you want to see the full output here.
TypeError: No NumPy equivalent for TypeBitfieldID exists
PyTable example
f = tab.File('filename.h5', 'r')
group = f.get_node("/Run 1")
group.datasetIAmInterestedIn
Output from command
/Run 1/datasetIAmInterestedIn(UnImplemented(58023,)) ''
NOTE: The UnImplemented object represents a PyTables unimplemented dataset present in the 'filename.h5' HDF5 file. If you want to see this kind of HDF5 dataset implemented in PyTables, please contact the developers.
(I've included this in case it is helpful)
MATLAB
data = hdf5read("filename.h5","/Run 1/datasetIAmInterestedIn")
Output from command
Error using hdf5readc
Call to HDF5 library failed (unsupportedDatatype): "Datatype of an attribute value is not supported. Please disable the
reading of attribute values by setting the 'ReadAttributes' argument
to false. Type HELP HDF5INFO for more information.".
Error in hdf5read (line 100)
[data, attributes] = hdf5readc(settings.filename, ...
My attempts in exploring usage of ReadAttributes
have not yielded any useful information or results.
Upvotes: 0
Views: 738
Reputation: 2115
If you have a compound data-type you'll need to define the compound before you can read it. For example (and here the code is in MATLAB) you have a compound datatype that stores logging entries with strings (of different sizes), a double-precision float that stores a datetime and an integer ID:
char32_type = H5T.copy('H5T_FORTRAN_S1');
H5T.set_size(char32_type, 32);
char32_size = H5T.get_size(char32_type);
char64_type = H5T.copy('H5T_FORTRAN_S1');
H5T.set_size(char64_type, 64);
char64_size = H5T.get_size(char64_type);
int_type = H5T.copy('H5T_NATIVE_INT');
int_size = H5T.get_size(int_type);
dbl_type = H5T.copy('H5T_NATIVE_DOUBLE');
dbl_size = H5T.get_size(dbl_type);
sizes = [
char32_size
dbl_size
char64_size
int_size
];
offset = [ 0 ; cumsum(sizes) ]; % zero-based
log_type = H5T.create('H5T_COMPOUND', sum(sizes));
H5T.insert(log_type, 'user', offset(1), char32_type);
H5T.insert(log_type, 'datetime', offset(2), dbl_type);
H5T.insert(log_type, 'action', offset(3), char64_type);
H5T.insert(log_type, 'id', offset(4), int_type);
You can use this compound type log_type
to read/write the data. E.g. for reading a dataset /log
in the HDF5 file then something like the following:
plist = 'H5P_DEFAULT';
fid = H5F.open(log_file ,'H5F_ACC_RDONLY', plist);
log_dset = H5D.open(fid, '/log');
mem_space = 'H5S_ALL';
log = H5D.read(log_dset, log_type, mem_space, 'H5S_ALL', 'H5P_DEFAULT');
% Close the dataset and file once you're done
H5D.close(log_dset);
H5F.close(fid);
The output of the read (log
) will be a struct. The strings will be char arrays in inconvenient column-format, so will need to be transposed. Here I convert them to cell-arrays but you might want the String class or otherwise. I convert the datetime to an array of dates in string format. The id
field is fine, and doesn't need fettling.
log.user = cellstr(log.user');
log.action = cellstr(log.action');
log.datestr = cellstr(datestr(log.datetime));
% log.id is okay as is
If you don't know the format of the compound type, then you could be in trouble?
Upvotes: 1