nicdelillo
nicdelillo

Reputation: 677

Is there in python a single function that shows the full structure of a .hdf5 file?

When opening a .hdf5 file, one can explore the levels, keys and names of the file in different ways. I wonder if there is a way or a function that displays all the available paths to explore in the .hdf5. Ultimately showing the whole tree.

Upvotes: 7

Views: 8755

Answers (6)

Vincent Stragier
Vincent Stragier

Reputation: 91

I modified Alex44's solution to make it a MWE. And I decided to output a string to make the function more versatile. Also, I prefer to display the shape of the data instead of its length.

import h5py
import numpy as np


def hdf5_tree(hdf5_file: h5py.File | h5py.Group, prefix: str = "") -> str:
    """Return a string containing the tree representation of the HDF5 file.

    Args:
        hdf5_file (h5py.File | h5py.Group): the HDF5 file object.
        prefix (str, optional): the prefix. Defaults to "".

    Returns:
        str: the full or partial tree representation of the file.
    """
    tree_string = ""
    items_index = len(hdf5_file)

    for key, hdf5_value in hdf5_file.items():
        items_index -= 1

        branch_symbol = "├──"
        prefix_symbol = "|"

        if items_index == 0:
            branch_symbol = "└──"
            prefix_symbol = " "

        if isinstance(hdf5_value, h5py.Group):
            tree_string += f"{prefix}{branch_symbol} {key}\n"
            tree_string += hdf5_tree(hdf5_value, f"{prefix}{prefix_symbol}   ")

        else:
            try:
                tree_string += f"{prefix}{branch_symbol} "
                tree_string += f"{key} {hdf5_value.shape})\n"

            except TypeError:
                tree_string += f"{prefix}{branch_symbol} {key} (scalar)\n"

    return tree_string


with h5py.File("test.h5", mode="w") as hdf5_file:
    hdf5_group = hdf5_file.require_group("top_group")
    hdf5_annotations = hdf5_group.require_group("annotations")
    hdf5_annotations.require_group("test_0")
    hdf5_annotations.require_group("test_1")
    hdf5_signals = hdf5_group.require_group("signals")
    data = np.asarray([0, 5, 4, 0, 4, 5, 6], dtype=np.float64)
    hdf5_signals.create_dataset("EEG_SIGNAL", data=data)

    print(hdf5_tree(hdf5_file))

Upvotes: 0

huoneusto
huoneusto

Reputation: 1204

A quick and dirty solution:

import h5py

file = h5py.File('file.hdf5')
file.visit(lambda x: print (x))

Upvotes: 0

Alex44
Alex44

Reputation: 3855

For all, who want to stay with the h5py package:

This is not a one-liner from implementation perspective, but it works with the h5py package. With this recursive function you can use it as one-liner:

import h5py

filename_hdf = 'data.hdf5'

def h5_tree(val, pre=''):
    items = len(val)
    for key, val in val.items():
        items -= 1
        if items == 0:
            # the last item
            if type(val) == h5py._hl.group.Group:
                print(pre + '└── ' + key)
                h5_tree(val, pre+'    ')
            else:
                try:
                    print(pre + '└── ' + key + ' (%d)' % len(val))
                except TypeError:
                    print(pre + '└── ' + key + ' (scalar)')
        else:
            if type(val) == h5py._hl.group.Group:
                print(pre + '├── ' + key)
                h5_tree(val, pre+'│   ')
            else:
                try:
                    print(pre + '├── ' + key + ' (%d)' % len(val))
                except TypeError:
                    print(pre + '├── ' + key + ' (scalar)')

with h5py.File(filename_hdf, 'r') as hf:
    print(hf)
    h5_tree(hf)

Upvotes: 10

ucy
ucy

Reputation: 11

I wanted to write the hdf5 structures into a text file. So I had to modify @Alex44's code. Now you can store the whole structure as a string in case that's what you want.

import h5py

def h5_tree(val, pre='', out=""):
    length = len(val)
    for key, val in val.items():
        length -= 1
        if length == 0:  # the last item
            if type(val) == h5py._hl.group.Group:
                out += pre + '└── ' + key + "\n"
                out = h5_tree(val, pre+'    ', out)
            else:
                out += pre + '└── ' + key + f' {val.shape}\n'
        else:
            if type(val) == h5py._hl.group.Group:
                out += pre + '├── ' + key + "\n"
                out = h5_tree(val, pre+'│   ', out)
            else:
                out += pre + '├── ' + key + f' {val.shape}\n'
    return out

filename = "dummy.h5"
with h5py.File(filename, "r") as file:
    structure = h5_tree(file)
    print(structure)

Upvotes: 1

kcw78
kcw78

Reputation: 7996

You can also get the file schema/contents without writing any Python code or installing additional packages. If you just want to see the entire schema, take a look at the h5dump utility from The HDF Group. There are options to control the amount of detail that is dumped. Note: the default option is dump everything. To get a quick/small dump, use :h5dump -n 1 --contents=1 h5filename.h5.

Another Python pakcage is PyTables. It has a utility ptdump that is a command line tool to interrogate a HDF file (similar to h5dump above).

Finally, here are some tips if you want to programmatically access groups and datasets recursively in Python. h5py and tables (PyTables) each have methods to do this:

In h5py:
Use the object.visititems(callable) method. It calls the callable function for each object in the tree.

In PyTables:
PyTables has multiple ways to recursively access groups, datasets and nodes. There are methods that return an iterable (object.walk_nodes), or return a list (object.list_nodes). There is also a method that returns an iterable that is not recursive (object.iter_nodes).

Upvotes: 3

AzyCrw4282
AzyCrw4282

Reputation: 7744

Try using nexuformat package to list the structure of the hdf5 file.

Install by pip install nexusformat

Code

import nexusformat.nexus as nx
f = nx.nxload(‘myhdf5file.hdf5’)
print(f.tree)

This should print the entire structure of the file. For more on that see this thread. Examples can be found here

Upvotes: 4

Related Questions