user674155
user674155

Reputation:

Listing datasets in a group in HDF5

I decided to store my data in HDF5 using its hierarchical structure instead of relying on the filesystem. Unfortunately, I'm having performance issues.

My data is formatted as follows: I have about 70 top level groups, corresponding to dates and each of them contain roughly 8000 datasets. I would like to see a list of the number of datasets per day:

for date in hdf5.keys():
   print(len(hdf5[date]))

I'm finding it a little frustrating that this takes 2+ second/iteration.

Also, I have two different hdf5 files with the above layout and the bigger one is much slower at this.

What am I doing wrong?

Upvotes: 2

Views: 4387

Answers (1)

John Readey
John Readey

Reputation: 571

Try creating the file with the libver latest flag:

f = h5py.File('name.hdf5', libver='latest') 

This will be much faster if you have a lot of datasets per group or attributes per dataset.

Upvotes: 1

Related Questions