Ger
Ger

Reputation: 431

Splitting multidimensional array in Numpy

I'm trying to split a multidimensional array (array)

import numpy as np

shape = (3, 4, 4, 2)
array = np.random.randint(0,10,shape)

into an array (new_array) with shape (3,2,2,2,2,2) where the dimension 1 has been split into 2 (dimension 1 and 2) and dimension 2 in array has been split into 2 (dimensions 3 and 4).

So far I got a working method which is:

div_x = 2
div_y = 2
new_dim_x = shape[1]//div_x
new_dim_y = shape[2]//div_y

new_array_split = np.array([np.split(each_sub, axis=2, indices_or_sections=div_y) for each_sub in np.split(array[:, :(new_dim_x*div_x), :(new_dim_y*div_y)], axis=1, indices_or_sections=div_x)]) 

I'm also looking into using reshape:

new_array_reshape = array[:, :(div_x*new_dim_x), :(div_y*new_dim_y), ...].reshape(shape[0], div_x, div_y, new_dim_x, new_dim_y, shape[-1]).transpose(1,2,0,3,4,5)

The reshape method is faster than the split method:

%timeit array[:, :(div_x*new_dim_x), :(div_y*new_dim_y), ...].reshape(shape[0], div_x, div_y, new_dim_x, new_dim_y, shape[-1]).transpose(1,2,0,3,4,5)
2.16 µs ± 44.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit np.array([np.split(each_sub, axis=2, indices_or_sections=div_y) for each_sub in np.split(array[:, :(new_dim_x*div_x), :(new_dim_y*div_y)], axis=1, indices_or_sections=div_x)])
58.3 µs ± 2.13 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

However, I cannot get the same results, because of the last dimension:

print('Reshape method')
print(new_array_reshape[1,0,0,...])
print('\nSplit method')
print(new_array_split[1,0,0,...])
 
Reshape method
[[[2 2]
  [4 3]]
 [[3 5]
  [5 9]]]

Split method
[[[2 2]
  [4 3]]
 [[5 3]
  [9 8]]]

The split method does exactly what I want, I did check number by number and it does the type of split I want, but not at the speed I would like.

QUESTION

Is there a way to achieve the same results as the split method, using reshape or any other approach?

CONTEXT

The array is actually data flow from image processing, where the first dimension of array is the time, the second dimension is coordinate x (4), the third dimension is coordinate y (4) and the fourth dimension (2) is the Magnitude and phase of the flow.

I would like to split the images (coordinate x and y) into subimages making an array of pictures of 2x2 so I can analyse the flow more locally, perform averages, clustering, etc.

This process (splitting) is going to be performed many times that is why I'm looking for an optimal and efficient solution. I believe the way is probably using reshape, but I'm open to any other option.

Upvotes: 2

Views: 656

Answers (2)

Daniel F
Daniel F

Reputation: 14399

For your use case I'm not sure reshape is the best option. If you want to be able to locally average and cluster, you might want a window function:

from skimage.util import view_as_windows

def window_over(arr, size = 2, step = 2, axes = (1, 2) ):
    wshp = list(arr.shape)
    for a in axes:
        wshp[a] = size
    return view_as_windows(arr, wshp, step).squeeze()

window_over(test).shape
Out[]: (2, 2, 3, 2, 2, 2)

Your output axes can then be rearranged how you want using transpose. The benefit of this is that you can get the intermediate windows:

window_over(test, step = 1).shape
Out[]: (3, 3, 3, 2, 2, 2)

That includes the 2x2 windows that overlap, so you get 3x3 results.

Since overlapping is possible, you also don't need your windows to be divisible by the dimension size:

window_over(test, size = 3).shape
Out[]: (2, 2, 3, 3, 3, 2)

Upvotes: 1

Divakar
Divakar

Reputation: 221574

Reshape and permute axes -

array.reshape(3,2,2,2,2,2).transpose(1,3,0,2,4,5)

Upvotes: 1

Related Questions