Reputation: 135
I am looking for negative values in a .hdf5 file that has the following architecture:
- Incidence_0
- Wavelength_0
- (Table of size m * n)
- Wavelength_1
- (Table of size m * n)
...
- Incidence_1
...
...
My objective is to find every negative value, and to get back its exact position in the file (ie, the number of the incidence, the number of the wavelength, and its position in the associated table).
I am sorry that I cannot give and minimal reproducible example because I cannot give the file that I'm using, but here is the idea.
import h5py
file = h5py.File('testFile.hdf5', 'r')
result = []
for incidence in range(nbIncidence):
for wavelength in range(nbWavelength):
for theta in range(nbTheta):
for phi in range(nbPhi):
value = file['Incidence_' + str(incidence)]['Wavelength_' + str(wavelength)][theta, phi]
if (value < 0):
result.append([value, incidence, wavelength, theta, phi])
This is perfectly working, but using four loops is time-consuming, especially if I have to work on huge files, that may probably happen... I don't know enough the h5py library but I am pretty sure that it exists a way to do this way faster than that.
Upvotes: 0
Views: 634
Reputation: 8046
First, the bad news: h5py
doesn't have a function to interrogate your data in the way you described. The good news: you can accomplish your task by extracting each Incident/Wavelength dataset to an NumPy array, then combining 2 NumPy methods to operate on the extracted array. [Note: This assumes you have sufficient memory to load each dataset.]
Some observations on working with this data (to help you follow my example).
.keys()
method. (Or, you can get (name, object)
tuples with the .items()
method.)arr = file['Incidence_#']['Wavelength_#'][()]
arr < 0
will return True
for all negative values (and False
for other values).np.argwhere
to find array element indices that are non-zero. Use this on the boolean array (remembering True is non-zero, and False is zero).I created a simple example that mimics your schema to demonstrate the process. For completeness, that code is at the end. It's a small file that won't bury you in output.
Code below reads the data, finds negative values, and adds data to a list. It has several print statements so you can see how each step works. They aren't need once you are confident in the procedure.
with h5py.File('testFile.hdf5', 'r') as h5fr:
result = []
for i_grp in h5fr.keys():
for wave_ds in h5fr[i_grp].keys():
wave_arr = h5fr[i_grp][wave_ds][()]
neg_idx = np.argwhere(wave_arr < 0.0)
wave_res = []
for n in neg_idx:
i, j = n[0], n[1]
result.append([wave_arr[i,j], i_grp, wave_ds, i, j])
wave_res.append([wave_arr[i,j], i_grp, wave_ds, i, j])
print(f'\nResults for {i_grp}; {wave_ds}:')
print(wave_res)
Code to create the example file used above:
nbIncidence = 4
nbWavelength = 6
m, n = 10, 10 # m,n same as nbTheta, nbPhi??
with h5py.File('testFile.hdf5', 'w') as h5fw:
for i_cnt in range(nbIncidence):
grp = h5fw.create_group('Incidence_' + str(i_cnt))
for w_cnt in range(nbWavelength):
arr = np.random.uniform(low=-1.0, high=10.0, size=(m,n)) #.reshape(m,n)
grp.create_dataset('Wavelength_' + str(w_cnt), data=arr)
Upvotes: 1