Derek Eden
Derek Eden

Reputation: 4638

how to loop through all keys and values in hdf5 file and determine which contain data?

I have results from a model simulation stored in a hdf5 file (.hdf).

I know how to open the file and peruse the data using h5py module.

The problem is, there are so many nested keys and datasets that it's a serious pain to actually find all of them and determine which actually have data in them.

This is what I am currently dealing with:

import h5py
f = h5py.File('results.hdf') #to read the file

k1 = f.keys() #shows the keys in the first level

k1
<KeysViewHDF5 ['Event Conditions', 'Geometry', 'Plan Data', 'Results']>

Now, to see all the data that is stored, I can do something like:

for k1 in f:
    for k2 in f[k1].keys():
        for k3 in f[k1][k2].keys():
            print(f[k1][k2][k3])  

<HDF5 group "/Event Conditions/Unsteady/Boundary Conditions" (2 members)>
<HDF5 group "/Event Conditions/Unsteady/Initial Conditions" (0 members)>
<HDF5 dataset "Attributes": shape (350,), type "|V45">
<HDF5 dataset "Polyline Info": shape (350, 4), type "<i4">
<HDF5 dataset "Polyline Parts": shape (350, 2), type "<i4">
<HDF5 dataset "Polyline Points": shape (3598, 2), type "<f8">
<HDF5 dataset "Attributes": shape (3,), type "|V37">
<HDF5 dataset "Polygon Info": shape (3, 4), type "<i4">
<HDF5 dataset "Polygon Parts": shape (3, 2), type "<i4">
<HDF5 dataset "Polygon Points": shape (344, 2), type "<f8">
<HDF5 dataset "Attributes": shape (1,), type "|V64">
<HDF5 dataset "Cell Info": shape (1, 2), type "<i4">
<HDF5 dataset "Cell Points": shape (586635, 2), type "<f8">
<HDF5 group "/Geometry/2D Flow Areas/Delta" (0 members)>
<HDF5 group "/Geometry/2D Flow Areas/Perimeter 1" (25 members)>
<HDF5 dataset "Polygon Info": shape (1, 4), type "<i4">
<HDF5 dataset "Polygon Parts": shape (1, 2), type "<i4">
<HDF5 dataset "Polygon Points": shape (610, 2), type "<f8">
<HDF5 dataset "Attributes": shape (1,), type "|V60">
<HDF5 dataset "External Faces": shape (177,), type "|V24">
<HDF5 dataset "Polyline Info": shape (1, 4), type "<i4">
<HDF5 dataset "Polyline Parts": shape (1, 2), type "<i4">
<HDF5 dataset "Polyline Points": shape (5, 2), type "<f8">
<HDF5 dataset "TIN Info": shape (347, 4), type "<i4">
<HDF5 dataset "TIN Points": shape (13591, 4), type "<f8">
<HDF5 dataset "TIN Triangles": shape (20008, 3), type "<i4">
<HDF5 dataset "XSIDs": shape (347, 2), type "<i4">
<HDF5 dataset "Attributes": shape (348,), type "|V676">
<HDF5 group "/Geometry/Cross Sections/Flow Distribution" (5 members)>
<HDF5 dataset "Manning's n Info": shape (348, 2), type "<i4">
<HDF5 dataset "Manning's n Values": shape (1044, 2), type "<f4">
<HDF5 dataset "Polyline Info": shape (348, 4), type "<i4">
<HDF5 dataset "Polyline Parts": shape (348, 2), type "<i4">
<HDF5 dataset "Polyline Points": shape (696, 2), type "<f8">
<HDF5 dataset "Station Elevation Info": shape (348, 2), type "<i4">
<HDF5 dataset "Station Elevation Values": shape (151973, 2), type "<f4">
<HDF5 dataset "Attributes": shape (41,), type "|V32">
<HDF5 dataset "Calibration Table": shape (2,), type "|V200">
<HDF5 dataset "Polygon Info": shape (41, 4), type "<i4">
<HDF5 dataset "Polygon Parts": shape (41, 2), type "<i4">
<HDF5 dataset "Polygon Points": shape (45442, 2), type "<f8">
<HDF5 dataset "Polyline Info": shape (2, 4), type "<i4">
<HDF5 dataset "Polyline Parts": shape (2, 2), type "<i4">
<HDF5 dataset "Polyline Points": shape (1768, 2), type "<f8">
<HDF5 dataset "Attributes": shape (1,), type "|V96">
<HDF5 dataset "Polyline Info": shape (1, 4), type "<i4">
<HDF5 dataset "Polyline Parts": shape (1, 2), type "<i4">
<HDF5 dataset "Polyline Points": shape (2042, 2), type "<f8">
<HDF5 dataset "Polyline Info": shape (2, 4), type "<i4">
<HDF5 dataset "Polyline Parts": shape (2, 2), type "<i4">
<HDF5 dataset "Polyline Points": shape (1152, 2), type "<f8">
<HDF5 dataset "Attributes": shape (1,), type "|V253">
<HDF5 dataset "Centerline Info": shape (1, 4), type "<i4">
<HDF5 dataset "Centerline Parts": shape (1, 2), type "<i4">
<HDF5 dataset "Centerline Points": shape (48, 2), type "<f8">
<HDF5 dataset "Profiles": shape (500,), type "|V28">
<HDF5 dataset "Compute Messages (rtf)": shape (1,), type "|S293107">
<HDF5 dataset "Compute Messages (text)": shape (1,), type "|S215682">
<HDF5 dataset "Compute Processes": shape (6,), type "|V332">
<HDF5 group "/Results/Unsteady/Geometry Info" (3 members)>
<HDF5 group "/Results/Unsteady/Output" (1 members)>
<HDF5 group "/Results/Unsteady/Summary" (0 members)>

But if I keep doing this, first it starts to get ridiculous and there's clearly a cleaner way, and second it starts to crash because some keys only go down a certain number of levels.

I want to know all possible keys/paths to data in the hdf file, and if they contain data (some do not).

Possibly some kind of loop with try/except in it to handle the end of a path?

Please help anyone if you know how!

Thanks.

Upvotes: 5

Views: 8886

Answers (1)

Daniel Farrell
Daniel Farrell

Reputation: 9750

From here and the docs link is this http://docs.h5py.org/en/latest/high/group.html#Group.visit,

def print_attrs(name, obj):
    print(name)
    for key, val in obj.attrs.items():
        print("    %s: %s" % (key, val))

f = h5py.File('foo.hdf5', 'r')
f.visititems(print_attrs)

It’s using the delegate pattern. You need to pass a callable and h5py will call it with names and object values. In your callable you can inspect and decide what to do.

Upvotes: 3

Related Questions