Listing datasets in a group in HDF5

Question

I decided to store my data in HDF5 using its hierarchical structure instead of relying on the filesystem. Unfortunately, I'm having performance issues.

My data is formatted as follows: I have about 70 top level groups, corresponding to dates and each of them contain roughly 8000 datasets. I would like to see a list of the number of datasets per day:

for date in hdf5.keys():
   print(len(hdf5[date]))

I'm finding it a little frustrating that this takes 2+ second/iteration.

Also, I have two different hdf5 files with the above layout and the bigger one is much slower at this.

What am I doing wrong?

Listing datasets in a group in HDF5

Answers (1)

Related Questions