Python - initialising and storing data of varying size and dimension

Question

I have multiple files of 3D data that I am trying to read in to my Python script in a more efficient manner. Currently I am changing everything manually rather than by loops, so I would like to automate this a bit.

The data breaks down like this: each file represents data at a different location. Each location has been post-processed with multiple different filters. Each of these sets of filtered data can be loaded in as a 3D array of data (which is 2D space + parameter information in the 3rd dimension).

I currently have two lists, path_list and filter_size, which contain all data file names and all post-processing filter sizes respectively.

My plan is to loop over these as:

path_list = ['1', '2', '3', '4', '5']
filter_size = ['1', '2', '3']
for counter, value in enumerate(path_list):
    path = value
    for counter, value in enumerate(filter_size):
        filter = value

and save my data for each filter and path in some structure grid_block. However, I do not know how to initialise grid_block correctly. grid_block dimensions for all of my current files are constant but could change in the future so I would rather avoid hard-coding in any dimensions. If I were to initialise as:

grid_block = np.zeros((np.size(path_list),np.size(filter_size)))
for counter, value in enumerate(path_list):
    path = value
    for counter, value in enumerate(filter_size):
        filter = value

Then I would end up with a 2D array. I am wondering if it is possible for me to then initialise each element of this 2D array as an empty 3D array? Then each element of the 2D array grid_block would store the 3D data associated with a path and filter. That is, can I do something like:

#Pseudocode:
#grid_block is initialised here as 2D array, 
# ...with each element being a 3D array of unknown/unspecified size
for counter, value in enumerate(path_list):
    path = value
    i = counter
    for counter, value in enumerate(filter_size):
        filter = value
        j = counter
        #Pseudocode:
        #grid_block[i, j] = load_in_data(path,filter)

Is this possible or am I over-thinking this?

Edit: It's difficult to show exactly how the data is being loaded in because the file types are not standard. For the purposes of the question we can just say that data is loaded in by calling some function get_data. Currently it then all ends up looking something like this:

path_list = ['1', '2', '3', '4', '5']
filter_size = ['1', '2', '3']
for counter, value in enumerate(path_list):
    path = value
    for counter, value in enumerate(filter_size):
        filter = value
        temp = get_data(path, filter)

Where temp is a 3D array of size (x_points, y_points, variables). So really I would like to load each temp into a single array or list which I am calling grid_block. So ideally here, if grid_block were of size (5x3) the element grid_block[3,1] would recall the temp data that had been stored for path = '4', filter = '2'.

Anurag Reddy · Accepted Answer

Since the temp variable of varying size, I would recommend appending to a list and then reshape it using numpy.

# Initialize a grid array
grid_count_arr = np.full((max(path), max(filter), -1)
# Inside the loop 
grid_list.append(temp)
grid_count_arr[i,j] = curr_list_counter
curr_list_counter +=1

This way you can access the elements by accessing the list

grid_list[grid_count_arr[path,filter]]

Python - initialising and storing data of varying size and dimension

Answers (1)

Related Questions