218
218

Reputation: 1824

Numpy matrices - how to handle arbitrary size

I have some code which carries out various processing functions on a matrix of input data. The input data may be 2, 3 or 4D.

I need to remove n-1 dimensional matrices from the input data for processing and then another matrix receives this processed data. What is the best way to do this, in order to handle the varying sizes of input data.

For example, at the moment I have code of the following form:

import numpy as np

if number_dims == 2:
  output_matrix = np.zeros([size_dim1,final_size_dim2])
  for i in range(0,dim1,1):
      data_to_process = input_data[i,:]
      output_matrix[i,:] = processing_funcs(data_to_process)

if number_dims == 3:
  output_matrix = np.zeros([size_dim1,final_size_dim2,final_size_dim3])
  for i in range(0,dim1,1):
      data_to_process = input_data[i,:,:]
      output_matrix[i,:,:] = processing_funcs(data_to_process)

if number_dims == 4:
  output_matrix = np.zeros([size_dim1,final_size_dim2,final_size_dim3,final_size_dim4])
  for i in range(0,dim1,1):
      data_to_process = input_data[i,:,:,:]
      output_matrix[i,:,:,:] = processing_funcs(data_to_process)

Is there a good way to do this in python without the repeated if statements? The complication is that the final data size in the n-1 indirect dimensions is not the same as the input size, so I can't for example, just do:

output_matrix = np.zeros([np.shape(input_data)])

It would also be good if there was a way to take slices along dimension 1, regardless of how many other dimensions there are.

Upvotes: 0

Views: 602

Answers (2)

hpaulj
hpaulj

Reputation: 231375

These are functionally the same:

output_matrix[i,:,:,:]
output_matrix[i,...]
output_matrix[i]

or more generally:

x[:,i,j,:,:]
x[:,i,j,...]
x[:,i,j]

As long as it is clear where dimensions are being indexed, trailing `:' can be omitted, or replaced with ellipsis ('...'). ellipsis can also be used at the beginning or middle - again provided the expression is not ambiguous.

np.take and np.put are also useful when indexing on certain axes.

You can also create a tuple of indexes (and slice and ellipsis), and use that

In [222]: ind=(slice(3,5),slice(None),1,Ellipsis)
In [223]: x[ind].shape
Out[223]: (2, 3, 32)

I think all your cases can be handled with:

result = np.zeros_list(input_data)
for i in range(input_data.shape[0]):
    result[i] = processing_funcs(input_data[i])

or

result = [processing_funcs(subdata) for subdata in input_data]
result = np.array(result)    

Iterating on an an array effectively indexes on the first axis. Collecting the results in a list and then passing that to np.array is a standard way of creating an array. np.array normally combines the elements of the list into a new array on a new dimension. You could also use concatenate, but that might require adding an initial dimension.

result = [processing_funcs(subdata)[None,...] for subdata in input_data]
result = np.concatenate(result, axis=0)

Upvotes: 1

B. M.
B. M.

Reputation: 18628

for i in range(dim1): 
      output_matrix[i] = processing_funcs(input_data[i])

should work regardless of other dimensions. For shape, probably you can do something like output_matrix=zeros(f(input_matrix.shape)) ?

Upvotes: 1

Related Questions