Reputation: 431
I'm trying to split a multidimensional array (array
)
import numpy as np
shape = (3, 4, 4, 2)
array = np.random.randint(0,10,shape)
into an array (new_array
) with shape (3,2,2,2,2,2)
where the dimension 1 has been split into 2 (dimension 1 and 2) and dimension 2 in array
has been split into 2 (dimensions 3 and 4).
So far I got a working method which is:
div_x = 2
div_y = 2
new_dim_x = shape[1]//div_x
new_dim_y = shape[2]//div_y
new_array_split = np.array([np.split(each_sub, axis=2, indices_or_sections=div_y) for each_sub in np.split(array[:, :(new_dim_x*div_x), :(new_dim_y*div_y)], axis=1, indices_or_sections=div_x)])
I'm also looking into using reshape
:
new_array_reshape = array[:, :(div_x*new_dim_x), :(div_y*new_dim_y), ...].reshape(shape[0], div_x, div_y, new_dim_x, new_dim_y, shape[-1]).transpose(1,2,0,3,4,5)
The reshape
method is faster than the split
method:
%timeit array[:, :(div_x*new_dim_x), :(div_y*new_dim_y), ...].reshape(shape[0], div_x, div_y, new_dim_x, new_dim_y, shape[-1]).transpose(1,2,0,3,4,5)
2.16 µs ± 44.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit np.array([np.split(each_sub, axis=2, indices_or_sections=div_y) for each_sub in np.split(array[:, :(new_dim_x*div_x), :(new_dim_y*div_y)], axis=1, indices_or_sections=div_x)])
58.3 µs ± 2.13 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
However, I cannot get the same results, because of the last dimension:
print('Reshape method')
print(new_array_reshape[1,0,0,...])
print('\nSplit method')
print(new_array_split[1,0,0,...])
Reshape method
[[[2 2]
[4 3]]
[[3 5]
[5 9]]]
Split method
[[[2 2]
[4 3]]
[[5 3]
[9 8]]]
The split method does exactly what I want, I did check number by number and it does the type of split I want, but not at the speed I would like.
QUESTION
Is there a way to achieve the same results as the split method, using reshape or any other approach?
CONTEXT
The array is actually data flow from image processing, where the first dimension of array
is the time, the second dimension is coordinate x (4), the third dimension is coordinate y (4) and the fourth dimension (2) is the Magnitude and phase of the flow.
I would like to split the images (coordinate x and y) into subimages making an array of pictures of 2x2 so I can analyse the flow more locally, perform averages, clustering, etc.
This process (splitting) is going to be performed many times that is why I'm looking for an optimal and efficient solution. I believe the way is probably using reshape
, but I'm open to any other option.
Upvotes: 2
Views: 656
Reputation: 14399
For your use case I'm not sure reshape
is the best option. If you want to be able to locally average and cluster, you might want a window function:
from skimage.util import view_as_windows
def window_over(arr, size = 2, step = 2, axes = (1, 2) ):
wshp = list(arr.shape)
for a in axes:
wshp[a] = size
return view_as_windows(arr, wshp, step).squeeze()
window_over(test).shape
Out[]: (2, 2, 3, 2, 2, 2)
Your output axes can then be rearranged how you want using transpose
. The benefit of this is that you can get the intermediate windows:
window_over(test, step = 1).shape
Out[]: (3, 3, 3, 2, 2, 2)
That includes the 2x2 windows that overlap, so you get 3x3 results.
Since overlapping is possible, you also don't need your windows to be divisible by the dimension size:
window_over(test, size = 3).shape
Out[]: (2, 2, 3, 3, 3, 2)
Upvotes: 1