Numpy matrices - how to handle arbitrary size

Question

I have some code which carries out various processing functions on a matrix of input data. The input data may be 2, 3 or 4D.

I need to remove n-1 dimensional matrices from the input data for processing and then another matrix receives this processed data. What is the best way to do this, in order to handle the varying sizes of input data.

For example, at the moment I have code of the following form:

import numpy as np

if number_dims == 2:
  output_matrix = np.zeros([size_dim1,final_size_dim2])
  for i in range(0,dim1,1):
      data_to_process = input_data[i,:]
      output_matrix[i,:] = processing_funcs(data_to_process)

if number_dims == 3:
  output_matrix = np.zeros([size_dim1,final_size_dim2,final_size_dim3])
  for i in range(0,dim1,1):
      data_to_process = input_data[i,:,:]
      output_matrix[i,:,:] = processing_funcs(data_to_process)

if number_dims == 4:
  output_matrix = np.zeros([size_dim1,final_size_dim2,final_size_dim3,final_size_dim4])
  for i in range(0,dim1,1):
      data_to_process = input_data[i,:,:,:]
      output_matrix[i,:,:,:] = processing_funcs(data_to_process)

Is there a good way to do this in python without the repeated if statements? The complication is that the final data size in the n-1 indirect dimensions is not the same as the input size, so I can't for example, just do:

output_matrix = np.zeros([np.shape(input_data)])

It would also be good if there was a way to take slices along dimension 1, regardless of how many other dimensions there are.

hpaulj · Accepted Answer

These are functionally the same:

output_matrix[i,:,:,:]
output_matrix[i,...]
output_matrix[i]

or more generally:

x[:,i,j,:,:]
x[:,i,j,...]
x[:,i,j]

As long as it is clear where dimensions are being indexed, trailing `:' can be omitted, or replaced with ellipsis ('...'). ellipsis can also be used at the beginning or middle - again provided the expression is not ambiguous.

np.take and np.put are also useful when indexing on certain axes.

You can also create a tuple of indexes (and slice and ellipsis), and use that

In [222]: ind=(slice(3,5),slice(None),1,Ellipsis)
In [223]: x[ind].shape
Out[223]: (2, 3, 32)

I think all your cases can be handled with:

result = np.zeros_list(input_data)
for i in range(input_data.shape[0]):
    result[i] = processing_funcs(input_data[i])

or

result = [processing_funcs(subdata) for subdata in input_data]
result = np.array(result)

Iterating on an an array effectively indexes on the first axis. Collecting the results in a list and then passing that to np.array is a standard way of creating an array. np.array normally combines the elements of the list into a new array on a new dimension. You could also use concatenate, but that might require adding an initial dimension.

result = [processing_funcs(subdata)[None,...] for subdata in input_data]
result = np.concatenate(result, axis=0)

Numpy matrices - how to handle arbitrary size

Answers (2)

Related Questions