Reputation: 133
This is a stripped down to essentials version of the problem I have:
I have a file with many (maybe millions) of datasets all under the same group like so:
"/Group" + Dataset0001 [Double arrays 2 to 3 dimensions and a lot of data] + Dataset0002 + Dataset0003 + ... + DatasetXXXX
The datasets are chunked and are written within a loop that only knows a slice of each dataset during each iteration. Thus, incomplete writing happens for all datasets at each iteration. This means I have to form the string with the name of the dataset and tell HDF5 to look for it and get the handle so I can write to it.
This is slow.
Is there a way to get the handle faster by, say, using the offset of the data in the file?
Upvotes: 0
Views: 601
Reputation: 5471
If you don't mind using the C API, there is H5Literate
. This allows you to apply a function to all datasets in a group. I think the only catch is that your function shouldn't throw exceptions.
Upvotes: 1
Reputation: 32873
Make an array of dataset names (or better: dataset handles) during initialization. Then you won't have to form the strings at each iteration. Time is expensive, memory is cheap!
That being said, a single dataset with one more dimension would probably be more efficient than millions of identically sized datasets with sequential names (if that's an option).
Upvotes: 1