AGentleRose
AGentleRose

Reputation: 133

Vectorized Creation of 2D Array from 3D array Given 1D Array of Indices

Given an array, x, of shape (2,n,m) and a set of indexes [i,j] where i,j < n, I am trying to obtain an array of shape (2,m) where the first array is at index [0,i] and the second array is at index [1,j]. This is a test case for generalizing to an array of shape (b,n,m) and a set of indexes of length b.

The obvious choice for this operation is np.choose, but this is acting differently than expected. We want to pair the row i of the first array with the row j of the second array. However, when using np.choose([i,j],x), np.choose pairs the first column from the array with index i with the second column from the array with index j (which can be seen in the code below) to get an array of shape (n,m). Obviously this task is easily performed with a for loop, but because of the use case (within Keras as a custom function of a tensor, where iteration is prohibited) I cannot do this. Is there a vectorized way of performing this operation, using either the Keras Backend functions or Numpy? I'm currently looking at using "map" to do this, and will update with my own answer if I figure it out.

Here's a code snippet showing you how np.choose handles the (2,n,m) array:

>>> import numpy as np 
>>> x = np.random.rand(2,4,2)
>>> choices = [3,1]
>>> np.choose(choices,x)
    ValueError: invalid entry in choice array
>>> np.choose([0,0],x)
    #Returns an array with x[0,:,0] and x[0,:,1] in shape(4,2)

Upvotes: 1

Views: 125

Answers (1)

B. M.
B. M.

Reputation: 18628

I believe you can use advanced indexing. On an example :

import numpy as np 
x = np.random.randint(0,10,(2,4,3))

x is :

[[[0 4 1]
  [8 8 1]
  [3 3 6]
  [4 7 8]]

 [[7 1 2]
  [5 9 9]
  [0 4 0]
  [7 8 3]]]

Now x[[0,1],[3,1],:] is :

[[4 7 8]
 [5 9 9]]

This can be extended to a (b,m,n) problem :

import numpy as np 
x = np.random.randint(0,10,(100,200,300))
choices= np.random.randint(0,200,(100))

def loop():
    res=np.empty((100,300),int)
    for i in range(100):
        res[i]=x[i,choices[i]]
    return res    

And some performance tests :

In [30]: %timeit loop()
10000 loops, best of 3: 140 µs per loop

In [31]: %timeit x[arange(100),choices,:]
10000 loops, best of 3: 23.7 µs per loop

Here the indexing method is only 6 times faster than a loop, because the task (extraction) cannot take advantage of the memory alignment.

Finally you can enhance the loop with just in time compilation by loop2=numba.njit(loop).

In [32]: %timeit loop2()
10000 loops, best of 3: 32 µs per loop

which shows that the indexing method is optimal.

Upvotes: 1

Related Questions