Reputation: 73

Numpy extract arbitrary subarray along one axis

For example, I have a 3-d array:

[[[ 0  1  2  3]
  [ 4  5  6  7]
  [ 8  9 10 11]]

 [[12 13 14 15]
  [16 17 18 19]
  [20 21 22 23]]

 [[24 25 26 27]
  [28 29 30 31]
  [32 33 34 35]]]

and the final array I want:

[[[ 0  1]
  [ 4  5]]

 [[18 19]
  [22 23]]

 [[26 27]
  [30 31]]]

Is there any efficient way to get array like that without using for loop ?

The reason to ask this question is that if we want to get arbitrary elements alone one axis, like:

[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]

I can get array [ 2 7 9] using a[np.arange(a.shape[0]), [2, 3, 1]] So I wonder if there is a similar way when elements become subarrays.

Upvotes: 4

Answers (4)

Jeff Dong

Reputation: 73

When ask this question I didn't expect it would use tricky methods like ix_ and as_strided , mentioned by hpaulj and B. M., which was quite hard to understand by beginners like me..

I come up with a more understandable but not so efficient way inspired by B. M.'s

a[[0,1,2],[0:2,1:3,0:2],[0:2,2:4,2:4]]

combining with advanced indexing.

What we need here is just translating a[[0,1,2],[0:2,1:3,0:2],[0:2,2:4,2:4]] to an advanced indexing of a. In our case is:

a[[0 0 0 0 1 1 1 1 2 2 2 2],
  [0 0 1 1 1 1 2 2 0 0 1 1],
  [0 1 0 1 2 3 2 3 2 3 2 3]]

A simple translate function can be like this:

(cartesian function in the code is from pv.'s answer for this question)

# a[[0,1,2],[0:2,1:3,0:2],[0:2,2:4,2:4]]
def translate_idx(idx1=[0, 1, 2], idx2=[(0,2),(1,3),(0,2)], idx3=[(0,2),(2,4),(2,4)]):
    # first we need to get the combinations for correponding intervals
    # e.g for (1,3) and (2,4) we get [[1, 2], [1, 3], [2, 2], [2, 3]]
    idx23 = []
    for i in range(len(idx2)):
        # the function cartesian here just generates all conbinations from some arrays
        idx23.append(cartesian((np.arange(*idx2[i]), np.arange(*idx3[i]))))
    idx23 = np.concatenate(idx23).T
    # now we get index for 2nd and 3rd axis
    # [[0 0 1 1 1 1 2 2 0 0 1 1]
    #  [0 1 0 1 2 3 2 3 2 3 2 3]]
    # we can repeat 4 times for idx1 and append idx23
    step = (idx2[0][1] - idx2[0][0]) * (idx3[0][1] - idx3[0][0])
    idx123 = np.append(np.array(idx1).repeat(step).reshape((1, -1)), idx23, axis=0)
    return idx123

Then I can use

idx = translate_idx()
a[idx[0], idx[1], idx[2]]

to get what I want.

Though this method is not quite efficient, I think it reveals some correlation between this kind of problems and advanced indexing. In practice as_strided is definitely a better choice.

Thank you all guys for the detailed answers :)

Upvotes: 0

hpaulj

Reputation: 231385

The overlapping blocks on the 2nd dimension require something like as_strided. B.M. has an earlier solution with as_strided, but I find it hard to understand. It also looks dangerous, since it displays out of bounds data. I'm approaching the task in smaller steps.

Selection along the last dimension for 'cols' [0,1] and [2,3] is relatively easy. Reshaping makes it easier.

In [27]: A=np.arange(36).reshape(3,3,4)
In [28]: A1=A.reshape(3,3,2,2)
In [29]: A2=A1[[0,1,2],:,[0,1,1],:]

In [30]: A2
Out[30]: 
array([[[ 0,  1],
        [ 4,  5],
        [ 8,  9]],

       [[14, 15],
        [18, 19],
        [22, 23]],

       [[26, 27],
        [30, 31],
        [34, 35]]])

Focusing on one subarray, I found I could see it as 2 overlapping arrays:

In [59]: as_strided(A2[0],shape=(2,2,2),strides=(8,8,4))
Out[59]: 
array([[[0, 1],
        [4, 5]],

       [[4, 5],
        [8, 9]]])

np.lib.stride_tricks.as_strided is a tricky function to use. I've seen it mostly used for moving-windows applications.

Applied to the whole array:

In [65]: A3=as_strided(A2,shape=(3,2,2,2),strides=(24,8,8,4))

In [66]: A3
Out[66]: 
array([[[[ 0,  1],
         [ 4,  5]],
         ...
        [[30, 31],
         [34, 35]]]])

And the target:

In [71]: A3[[0,1,2],[0,1,0]]
Out[71]: 
array([[[ 0,  1],
        [ 4,  5]],

       [[18, 19],
        [22, 23]],

       [[26, 27],
        [30, 31]]])

This can be put together in a way that allows assignment (shape and strides are adapted from the values for A1)

In [105]: A1 = A.reshape(3,3,2,2)
In [106]: A1s = as_strided(A1, shape=(3,2,2,2,2), strides=(48,16,16,8,4))

In [107]: A1s[[0,1,2],[0,1,0],:,[0,1,1],:]
Out[107]: 
array([[[ 0,  1],
        [ 4,  5]],

       [[18, 19],
        [22, 23]],

       [[26, 27],
        [30, 31]]])

Test of assignment:

In [108]: A1s[[0,1,2],[0,1,0],:,[0,1,1],:] = 99

In [109]: A
Out[109]: 
array([[[99, 99,  2,  3],
        [99, 99,  6,  7],
        [ 8,  9, 10, 11]],

       [[12, 13, 14, 15],
        [16, 17, 99, 99],
        [20, 21, 99, 99]],

       [[24, 25, 99, 99],
        [28, 29, 99, 99],
        [32, 33, 34, 35]]])

If you don't require assignment, it's easier to understand the logic without striding:

np.array([A2[0,:-1], A2[1,1:], A2[2,:-1]])

slices=(slice(-1),slice(1,None))    
np.array([A2[i,slices[j]] for i,j in zip([0,1,2],[0,1,0])])

===========================

Earlier stumbling around for an answer:

I think it can be extracted with advanced indexing.

Ind = np.ix_([0,0,1,1,2,2], [0,1,1,2,0,1], [0,1,2,3,2,3])
a[ind]

The values may need tweaking, since I can't test it here.

The idea is to enumerate which rows , cols are needed in each dimension, and use ix_ (I think that's the right function) to add newaxis so they broadcast together.

Your idea of generalizing the 2d case is right. The trick is figure out when you need np.array([0,1,2]), and when you need to rotate it with ix_ or [:,None],etc. I end up testing various ideas.

It may need to be

  Ind = np.ix_([0,1,2], [0,1,1,2,0,1], [0,1,2,3,2,3])

No this isn't quite right. It will produce a 3x6x6; you want 3x2x2.

Reshaping to 3x3x2x2 may make indexing one the last dimension easier.

The striding trick mentioned in another answer may help turn the overlapping selection on 2nd into a similar 2x2 block. But I'd have to play around with this.

What I'm imaging is an index tuple of the form ([1,2,3],[?,?,?],:,[?,?,?],:)

Upvotes: 2

B. M.

Reputation: 18638

A simple idea would be a[[0,1,2],[0:2,1:3,0:2],[0:2,2:4,2:4]] , but it is not implemented.

A workaround can be obtained with np.lib.stride_tricks.as_strided. Just define :

ast=np.lib.stride_tricks.as_strided(a,a.shape*2,a.strides*2)
#ast.shape is (3, 3, 4, 3, 3, 4).

Then you can define separately origin and size of blocs:

In [4]: ast[[0,1,2],[0,1,0],[0,2,2],0,:2,:2]
Out[4]: 
array([[[ 0,  1],
        [ 4,  5]],

       [[18, 19],
        [22, 23]],

       [[26, 27],
        [30, 31]]])

Some explanations :

origin :

You want to find the blocs beginning with elements 0,18,26 .

their indices in the reshaped array can be found by :

In [316]: np.unravel_index([0,18,26],a.shape)
Out[316]: 
(array([0, 1, 2], dtype=int64),
 array([0, 1, 0], dtype=int64),
 array([0, 2, 2], dtype=int64))

ast[[0,1,2],[0,1,0],[0,2,2]] is a (3,3,3,4) array. each (3,3,4) array begin with a selected element.

array([[[[          0,           1,           2,           3],
         [          4,           5,           6,           7],
         [          8,           9,          10,          11]],

        [[         12,          13,          14,          15],
         [         16,          17,          18,          19],
         [         20,          21,          22,          23]],

        [[         24,          25,          26,          27],
         [         28,          29,          30,          31],
         [         32,          33,          34,          35]]],


       [[[         18,          19,          20,          21],
         [         22,          23,          24,          25],
         [         26,          27,          28,          29]],

        [[         30,          31,          32,          33],
         [         34,          35,    23592960,       18335],
         [  697780028, -2147480064,   540876865,  1630433390]],

        [[ 2036429426,   538970664,   538976288,  1532698656],
         [  741355058,   808334368,   775168044,   874523696],
         [  744304686,   538976266,   538976288,   811278368]]],


       [[[         26,          27,          28,          29],
         [         30,          31,          32,          33],
         [         34,          35,    23592960,       18335]],

        [[  697780028, -2147480064,   540876865,  1630433390],
         [ 2036429426,   538970664,   538976288,  1532698656],
         [  741355058,   808334368,   775168044,   874523696]],

        [[  744304686,   538976266,   538976288,   811278368],
         [  539766830,   741355058,   808333600,   775036972],
         [  170679600,   538976288,   538976288,   774920992]]]])

As said in the documentation, as_strided is a dangerous hack, and must be used with care, since it can access elements that are not in the array if badly used. Next step will ensure the selection of valid elements.

size :

The interesting elements are the four in the upper left corner in each first bloc. So ast[[0,1,2],[0,1,0],[0,2,2],0,:2,:2] select them.

You can also define sized blocs like that: bloc122=ast[...,0,:2,:2] (bloc122.shape is (3, 3, 4, 2, 2) ) :

In [8]: bloc122[[0,1,2],[0,1,0],[0,2,2]]=0

In [9]: a
Out[9]: 
array([[[ 0,  0,  2,  3],
        [ 0,  0,  6,  7],
        [ 8,  9, 10, 11]],

       [[12, 13, 14, 15],
        [16, 17,  0,  0],
        [20, 21,  0,  0]],

       [[24, 25,  0,  0],
        [28, 29,  0,  0],
        [32, 33, 34, 35]]])

Upvotes: 3

Kasravnd

Reputation: 107297

You can use indexing to get the expected out put separately:

>>> a = np.array([[[ 0, 1, 2, 3],
...   [ 4, 5, 6, 7],
...   [ 8, 9,10,11]],
... 
...  [[12,13,14,15],
...   [16,17,18,19],
...   [20,21,22,23]],
... 
...  [[24,25,26,27],
...   [28,29,30,31],
...   [32,33,34,35]]])

>>> a[0,:2,:2]
array([[0, 1],
       [4, 5]])
>>> a[1,1:,2:]
array([[18, 19],
       [22, 23]])
>>> a[2,:2,2:]
array([[26, 27],
       [30, 31]])
>>>

Upvotes: 0

Numpy extract arbitrary subarray along one axis

Answers (4)

Related Questions