Reputation: 135
have been trying to read an .hdf5 file so that I can plot some of the data, to do so I thought I would take the data in the .hdf5 file and process it into a .txt file. To check the data I ended up using the code presented in reading nested .h5 group into numpy array.
import numpy as np
import h5py
f = h5py.File('15524.h5', 'r')
list(f.keys())
dset = f['Measurement_15524']
def traverse_datasets(hdf_file):
def h5py_dataset_iterator(g, prefix=''):
for key in g.keys():
item = g[key]
path = f'{prefix}/{key}'
if isinstance(item, h5py.Dataset): # test for dataset
yield (path, item)
elif isinstance(item, h5py.Group): # test for group (go down)
yield from h5py_dataset_iterator(item, path)
for path, _ in h5py_dataset_iterator(hdf_file):
yield path
with h5py.File('15524.h5', 'r') as f:
for dset in traverse_datasets(f):
print('Path:', dset)
print('Shape:', f[dset].shape)
print('Data type:', f[dset].dtype)
This resulted in the following output:
Path: /Measurement_15524/Waveforms/waveforms_0
Shape: (200,)
Data type: [('digital_in', '<i4'), ('encoder_phi', '<f4'), ('encoder_theta', '<f4'), ('encoder_x',
'<f4'), ('encoder_y', '<f4'), ('encoder_z', '<f4'), ('id', '<i8'), ('line_index', '<i4'),
('motor_phi', '<f4'), ('motor_theta', '<f4'), ('motor_x', '<f4'), ('motor_y', '<f4'), ('motor_z',
'<f4'), ('sweep_dir', 'i1'), ('timestamp', '<f8'), ('type', 'S9'), ('waveform', '<f8', (4096,)),
('x_offset', '<f8'), ('x_spacing', '<f8')]
I believe that the data should be shaped in columns and rows, how can I visualise the output in that way?
How can I transform this into a .txt file?
I have been told that this problem could be arising from the data being str and not float, so I have attempted to put it as a lot by substituting the third line , f, by:
f= h5py.File(float('15524.h5'), 'r')
But python stated that:
ValueError: could not convert string to float: '15524.h5'
I don't programme often so my apologies if this is common knowledge.
Upvotes: 1
Views: 2003
Reputation: 7996
Welcome to HDF5 where the data schema can be just about any format, and you get to figure it out. :-) Fortunately, HDF5 is self-describing, so with a little Python magic we can do that.
Yes, some datasets are shaped in rows and columns; other are more complicated. Checking the shape and dtype is the right place to start. The dataset you encountered has a "table like" structure. It has 200 rows of data, and each row has named fields (aka columns) of different data types (names and types are defined by the dtype). Some of your values are integers (i4/i8), some are floats (f4/f8 aka real numbers), 1 is a string ('type'), and 1 is an array of floats ('waveform'). Here are the first few field names and types:
Are the value you want to plot in the /Measurement_15524/Waveforms/waveforms_0
data set? If so, you can access this data as either a h5py dataset object or a NumPy array (will be returned as a "record array"). From there, it is "relatively straightforward" to write the data to a csv (file of comma separated values) to plot with another application. You can write the entire dataset to the file, or (with a little more coding) you can just write the 2 columns you want to plot. Add some details about the columns yo want to plot (as X and Y) and I can create some prototype code. Alternately, you could use the matplotlib package and create your plots with Python.
Note: there's nothing wrong with the method above, but it might be hard to understand if you are new to Python. There is a simpler way to traverse the HDF5 data structure with the h5py .visit()
or .visititems()
methods. The code below will do the same with .visititems()
. All of the checks are done in the "visitor function" (traverse_nodes()
in this example). The thing it does not do is yield the path back to the calling function.
import numpy as np
import h5py
def traverse_nodes(name, node):
if isinstance(node, h5py.Group):
print('\n',node.name, 'is a Group')
elif isinstance(node, h5py.Dataset):
print('\n',node.name, 'is a Dataset')
print('Path:', name)
print('Shape:', node.shape)
print('Data type:', node.dtype)
with h5py.File('15524.h5', 'r') as h5f:
h5f.visititems(traverse_nodes)
Upvotes: 1