ja.abell
ja.abell

Reputation: 133

Best way to access a HDF5 file in which datasets have consecutive name

This is a stripped down to essentials version of the problem I have:

I have a file with many (maybe millions) of datasets all under the same group like so:

"/Group" + Dataset0001 [Double arrays 2 to 3 dimensions and a lot of data] + Dataset0002 + Dataset0003 + ... + DatasetXXXX

The datasets are chunked and are written within a loop that only knows a slice of each dataset during each iteration. Thus, incomplete writing happens for all datasets at each iteration. This means I have to form the string with the name of the dataset and tell HDF5 to look for it and get the handle so I can write to it.

This is slow.

Is there a way to get the handle faster by, say, using the offset of the data in the file?

Upvotes: 0

Views: 601

Answers (2)

Yossarian
Yossarian

Reputation: 5471

If you don't mind using the C API, there is H5Literate. This allows you to apply a function to all datasets in a group. I think the only catch is that your function shouldn't throw exceptions.

Upvotes: 1

Simon
Simon

Reputation: 32873

Make an array of dataset names (or better: dataset handles) during initialization. Then you won't have to form the strings at each iteration. Time is expensive, memory is cheap!

That being said, a single dataset with one more dimension would probably be more efficient than millions of identically sized datasets with sequential names (if that's an option).

Upvotes: 1

Related Questions