Stewart Holmes
Stewart Holmes

Reputation: 23

NumPy indexing with varying position

I have an array input_data of shape (A, B, C), and an array ind of shape (B,). I want to loop through the B axis and take the sum of elements C[B[i]] and C[B[i]+1]. The desired output is of shape (A, B). I have the following code which works, but I feel is inefficient due to index-based looping through the B axis. Is there a more efficient method?

import numpy as np

input_data = np.random.rand(2, 6, 10)
ind = [ 2, 3, 5, 6, 5, 4 ]

out = np.zeros( ( input_data.shape[0], input_data.shape[1] ) )

for i in range( len(ind) ):
    d = input_data[:, i, ind[i]:ind[i]+2]
    out[:, i] = np.sum(d, axis = 1)

Edited based on Divakar's answer:

import timeit
import numpy as np

N = 1000

input_data = np.random.rand(10, N, 5000)
ind = ( 4999 * np.random.rand(N) ).astype(np.int)

def test_1(): # Old loop-based method
    out = np.zeros( ( input_data.shape[0], input_data.shape[1] ) )

    for i in range( len(ind) ):
        d = input_data[:, i, ind[i]:ind[i]+2]
        out[:, i] = np.sum(d, axis = 1)
    return out

def test_2(): 
    extent = 2 # Comes from 2 in "ind[i]:ind[i]+2"

    m,n,r = input_data.shape
    idx = (np.arange(n)*r + ind)[:,None] + np.arange(extent)
    out1 = input_data.reshape(m,-1)[:,idx].reshape(m,n,-1).sum(2)
    return out1

print timeit.timeit(stmt = test_1, number = 1000)
print timeit.timeit(stmt = test_2, number = 1000)

print np.all( test_1() == test_2(), keepdims = True )

>> 7.70429363482
>> 0.392034666757
>> [[ True]]

Upvotes: 2

Views: 103

Answers (2)

Divakar
Divakar

Reputation: 221554

Here's a vectorized approach using linear indexing with some help from broadcasting. We merge the last two axes of the input array, calculate the linear indices corresponding to the last two axes, perform slicing and reshape back to a 3D shape. Finally, we do summation along the last axis to get the desired output. The implementation would look something like this -

extent = 2 # Comes from 2 in "ind[i]:ind[i]+2"

m,n,r = input_data.shape
idx = (np.arange(n)*r + ind)[:,None] + np.arange(extent)
out1 = input_data.reshape(m,-1)[:,idx].reshape(m,n,-1).sum(2)

If the extent is always going to be 2 as stated in the question - "... sum of elements C[B[i]] and C[B[i]+1]", then you could simply do -

m,n,r = input_data.shape
ind_arr = np.array(ind)
axis1_r = np.arange(n)
out2 = input_data[:,axis1_r,ind_arr] + input_data[:,axis1_r,ind_arr+1]

Upvotes: 1

unutbu
unutbu

Reputation: 879481

You could also use integer array indexing combined with basic slicing:

import numpy as np

m,n,r = 2, 6, 10
input_data = np.arange(2*6*10).reshape(m, n, r)
ind = np.array([ 2, 3, 5, 6, 5, 4 ])
out = np.zeros( ( input_data.shape[0], input_data.shape[1] ) )
for i in range( len(ind) ):
    d = input_data[:, i, ind[i]:ind[i]+2]
    out[:, i] = np.sum(d, axis = 1)


out2 = input_data[:, np.arange(n)[:,None], np.add.outer(ind,range(2))].sum(axis=-1)
print(out2)
# array([[  5,  27,  51,  73,  91, 109],
#        [125, 147, 171, 193, 211, 229]])

assert np.allclose(out, out2)

Upvotes: 1

Related Questions