DLunin
DLunin

Reputation: 1070

Fast vectorized indexing in numpy

Suppose we have an array of indices of another numpy array:

import numpy as np
a = np.array([0, 3, 1])
b = np.array([0, 10, 20, 30, 40, 50, 60, 70])

We can use the array a as index directly:

b[a] # np.array([0, 30, 10])

But what if array a has more than one dimension? For example,

a = np.array([[0, 2], [1, 3], [2, 4]])
# I want to get b[a] = np.array([[0, 20], [10, 30], [20, 40]])

Numpy indexing doesn't work if the number of dimensions of a is greater than 1. We can achieve the desired result by using map

map(lambda x: b[x], a)

However, it is quite slow. For 1-dimensional case, direct indexing is about 10-100 times faster than using map.

Is there a way to do it faster?

Upvotes: 3

Views: 2259

Answers (2)

hpaulj
hpaulj

Reputation: 231615

What's the problem? I can index b with a 2d array. The output just matches a1 in shape:

In [64]: b = np.array([0, 10, 20, 30, 40, 50, 60, 70])
In [65]: a1 = np.array([[0, 2], [1, 3], [2, 4]])
In [66]: b[a1]
Out[66]: 
array([[ 0, 20],
       [10, 30],
       [20, 40]])

b[a1] is not the same as b[a1[:,0],a1[:,1]]. That is the 2 columns of a1 do not provide two indices (which would require a 2d b).

Upvotes: 2

Divakar
Divakar

Reputation: 221674

There is built-in np.take for exactly that same task -

np.take(b,a)

You can also flatten a with .ravel(), index into b and reshape back to a's shape -

b[a.ravel()].reshape(a.shape)

These NumPy based approaches would be much better than map(lambda x: b[x], a) both in terms of both performance and memory as using map would give us list of arrays.

Sample run -

In [34]: a
Out[34]: 
array([[0, 2],
       [1, 3],
       [2, 4]])

In [35]: b
Out[35]: array([ 0, 10, 20, 30, 40, 50, 60, 70])

In [36]: np.take(b,a)
Out[36]: 
array([[ 0, 20],
       [10, 30],
       [20, 40]])

In [37]: b[a.ravel()].reshape(a.shape)
Out[37]: 
array([[ 0, 20],
       [10, 30],
       [20, 40]])

Runtime tests -

In [39]: a = np.random.randint(0,100,(200,100))

In [40]: b = np.random.randint(0,100,(20000))

In [41]: %timeit map(lambda x: b[x], a)
1000 loops, best of 3: 643 µs per loop

In [42]: %timeit np.take(b,a)
10000 loops, best of 3: 105 µs per loop

In [43]: %timeit b[a.ravel()].reshape(a.shape)
1000 loops, best of 3: 231 µs per loop

Upvotes: 1

Related Questions