The Dude
The Dude

Reputation: 4005

Fast slicing .h5 files using h5py

I am working with .h5 files with little experience.

In a script I wrote I load in data from an .h5 file. The shape of the resulting array is: [3584, 3584, 75]. Here the values 3584 denotes the number of pixels, and 75 denotes the number of time frames. Loading the data and printing the shape takes 180 ms. I obtain this time using os.times().

If I now want to look at the data at a specific time frame I use the following piece of code:

data_1 = data[:, :, 1]

The slicing takes up a lot of time (1.76 s). I understand that my 2D array is huge but at some point I would like to loop over time which will take very long as I'm performing this slice within the for loop.

Is there a more effective/less time consuming way of slicing the time frames or handling this type of data?

Thank you!

Upvotes: 0

Views: 669

Answers (1)

Lasse V. Karlsen
Lasse V. Karlsen

Reputation: 391346

Note: I'm making assumptions here since I'm unfamiliar with .H5 files and the Python code the accesses them.

I think that what is happening is that when you "load" the array, you're not actually loading an array. Instead, I think that an object is constructed on top of the file. It probably reads in dimensions and information related to how the file is organized, but it doesn't read the whole file.

That object mimicks an array so good that when you later on perform the slice operation, the normal Python slice operation can be executed, but at this point the actual data is being read. That's why the slice takes so long time compared to "loading" all the data.

I arrive at this conclusion because of the following.

If you're reading 75 frames of 3584x3584 pixels, I'm assuming they're uncompressed (H5 seems to be just raw dumps of data), and in that case, 75 * 3.584 * 3.584 = 963.379.200, this is around 918MB of data. Couple that with you "reading" this in 180ms, we get this calculation:

918MB / 180ms = 5.1GB/second reading speed

Note, this number is for 1-byte pixels, which is also unlikely.

This speed thus seems highly unlikely, as even the best SSDs today reach way below 1GB/sec.

It seems much more plausible that an object is just constructed on top of the file and the slice operation incurs the cost of reading at least 1 frame worth of data.

If we divide the speed by 75 to get per-frame speed, we get 68MB/sec speed for 1-byte pixels, and with 24 or 32-bit pixels we get up to 270MB/sec reading speeds. Much more plausible.

Upvotes: 1

Related Questions