E_Murphy
E_Murphy

Reputation: 1

Initializing or populating multiple numpy arrays from h5 file groups

I have an h5 file with 5 groups, each group containing a 3D dataset. I am looking to build a for loop that allows me to extract each group into a numpy array and assign the numpy array to an object with the group header name. I am able to get a number of different methods to work with one group, but when I try to build a for loop that applies to code to all 5 groups, it breaks. For example:

import h5py as h5
import numpy as np

f = h5.File("FFM0012.h5", "r+") #read in h5 file
print(list(f.keys())) #['FFM', 'Image'] for my dataset
FFM = f['FFM'] #Generate object with all 5 groups
print(list(FFM.keys())) #['Amp', 'Drive', 'Phase', 'Raw', 'Zsnsr'] for my dataset

Amp = FFM['Amp'] #Generate object for 1 group
Amp = np.array(Amp) #Turn into numpy array, this works.

Now when I try to apply the same logic with a for loop:

h5_keys = [] 
FFM.visit(h5_keys.append) #Create list of group names ['Amp', 'Drive', 'Phase', 'Raw', 'Zsnsr']

for h5_key in h5_keys:
    tmp = FFM[h5_key]
    h5_key = np.array(tmp)

print(Amp[30,30,30]) #To check that array is populated

When I run this code I get "NameError: name 'Amp' is not defined". I've tried initializing the numpy array before the for loop with:

h5_keys = [] 
FFM.visit(h5_keys.append) #Create list of group names

Amp = np.array([])
for h5_key in h5_keys:
    tmp = FFM[h5_key]
    h5_key = np.array(tmp)

print(Amp[30,30,30]) #To check that array is populated

This produces the error message "IndexError: too many indices for array"

I've also tried generating a dictionary and creating numpy arrays from the dictionary. That is a similar story where I can get the code to work for one h5 group, but it falls apart when I build the for loop.

Any suggestions are appreciated!

Upvotes: 0

Views: 311

Answers (1)

hpaulj
hpaulj

Reputation: 231475

You seem to have jumped to using h5py and numpy before learning much of Python

Amp = np.array([])        # creates a numpy array with 0 elements
for h5_key in h5_keys:    # h5_key is set of a new value each iteration
    tmp = FFM[h5_key]
    h5_key = np.array(tmp)    # now you reassign h5_key

print(Amp[30,30,30])      # Amp is the original (0,) shape array

Try this basic python loop, paying attention to the value of i:

alist = [1,2,3]
for i in alist:
    print(i)
    i = 10
    print(i)
print(alist)       # no change to alist

f is the file.

FFM = f['FFM'] 

is a group

Amp = FFM['Amp']

is a dataset. There are various ways of load the dataset into an numpy array. I believe the [...] slicing is the current preferred one. .value used to used but is now deprecated (loading dataset)

Amp = FFM['Amp'][...]

is an array.

alist = [FFM[key][...] for key in h5_keys]

should create a list of arrays from the FFM group.

If the shapes are compatible, you can concatenate the arrays into one array:

np.array(alist)
np.stack(alist)
np.concatatenate(alist, axis=0)   # or other axis

etc

adict = {key: FFM[key][...] for key in h5_keys}

should crate of dictionary of array keyed by dataset names.

In Python, lists and dictionaries are the ways of accumulating objects. The h5py groups behave much like dictionaries. Datasets behave much like numpy arrays, though they remain on the disk until loaded with [...].

Upvotes: 1

Related Questions