oldmansaur
oldmansaur

Reputation: 163

Filter groups in hdf5 file using h5py

I have a problem to which I found an inelegant solution, and want to know if there is a better way of doing this. (Using python 3.6)

I want to store a set of results from experiments in different groups of an .hdf5 file. But I then want to be able to open the file, iterate over all the groups and only get the datasets from groups of a specific kind.

The inelegant solution I found is to keep the information to distinguish the groups in the group name. For instance a 01 in "ExpA01".

Code to generate the file:

import h5py
import numpy as np


if __name__ == "__main__":
    # name of the file
    FileName = "myFile.hdf5"

    # open the file
    myFile = h5py.File(FileName, 'w')

    # list of groups
    NameList = ["ExpA01", "ExpA02", "ExpB01", "ExpB02"]

    for name in NameList:

        # create new group with the name from the nameList
        myFile.create_group(name)

        # create random data
        dataset = np.random.randint(0, 10, 10)
        # add data set to the group
        myFile[name].create_dataset("x", data=dataset)

    myFile.close()  # close the file

Now I want to only read the data from the groups that end in "01". To do so, I basically read the information from the group name myFile[k].name.split("/")[-1][-2::] == "01".

Code for reading the file:

import h5py
import numpy as np


if __name__ == "__main__":

    FileName = "myFile.hdf5"

    # open the file
    myFile = h5py.File(FileName, 'r')

    for k in myFile.keys():  # loop over all groups
        if (myFile[k].name.split("/")[-1][-2::] == "01"):
            data = np.zeros(myFile[k]["x"].shape)
            myFile[k]["x"].read_direct(data)

            print(data)

myFile.close()

In short, writing distinguishing information into the group name and then slicing the string is an ugly solution.

What is a better way of doing this?

Thanks for reading.

Upvotes: 0

Views: 946

Answers (1)

kcw78
kcw78

Reputation: 8006

Have you considered adding an attribute to each group?
Then you could filter groups based on a test of attribute value. There are no limitations on attribute data type. My example uses a string, but they can be ints or floats.

# Quick example to create a group attribute, then retrieve:
In [3]: h5f = h5py.File('attr_test.h5','w')
In [4]: grp = h5f.create_group('group1')
In [5]: h5f['group1'].attrs['key']='value'
   ...: 
In [6]: get_value = h5f['group1'].attrs['key']
In [7]: print (get_value)
value

I thought I'd add another example with 2 different values for the attribute. It creates 26 groups named group_a thru group_z, and sets the key attribute to vowel for a/e/i/o/u and consonant for all other letters.

vowels = 'aeiouAEIOU'
h5f = h5py.File('attr_test.h5','w')
for ascii in range(97,123):
    grp = h5f.create_group('group_'+chr(ascii))
    if chr(ascii) in vowels: 
        grp.attrs['key']='vowel'
    else :
        grp.attrs['key']='consonant'

for grp in h5f.keys() :
    get_value = h5f[grp].attrs['key']
    print (grp,':',get_value)

Upvotes: 1

Related Questions