InternetUser0947
InternetUser0947

Reputation: 43

How can I replace values in a Python NumPy array with the index of those values found in another array?

I have an n*m array "a", and another 1D array "b", such as the following:

a = array([[ 51, 30, 20, 10],
           [ 10, 32, 65, 77],
           [ 15, 20, 77, 30]])

b = array([10, 15, 20, 30, 32, 51, 65, 77])

I would like to replace all elements in "a" with the corresponding index of "b" where that element lies. In the case above, I would like the output to be:

a = array([[ 5, 3, 2, 0],
           [ 0, 4, 6, 7],
           [ 1, 2, 7, 3]])

Please note, in real application my arrays are large, over 30k elements and several thousands of them. I have tried for loops but these take a long time to compute. I have also tried similar iterative methods, and using list.index() to grab the indices but this also takes too much time.

Can anyone help me in identifying first the indices of "b" for the elements of "a" which appear in "b", and then constructing the updated "a" array?

Thank you.

Upvotes: 2

Views: 1691

Answers (2)

LuWil
LuWil

Reputation: 56

This is posted as an answer only because it is too long for a comment. It supports orlp's solution posted above. Numpy's vectorize avoids an explicit loop, but it is clearly not the best approach. Note that Numpy's searchsorted can only be applied as shown when b is sorted.

import timeit
import numpy as np

a = np.random.randint(1,100,(1000,1000))
b = np.arange(0,1000,1)

def o1():
    lo = min(a.min(), b.min())
    hi = max(a.max(), b.max())
    lut = np.zeros(hi - lo + 1, dtype=np.int64)
    lut[b - lo] = np.arange(len(b))
    a2 = lut[a - lo]
    return a2 

def o2():
    a2 = a.copy()
    fu = np.vectorize(lambda i: np.place(a2, a2==b[i], i))
    fu(np.arange(0,len(b),1))

print(timeit.timeit("np.searchsorted(b, a)", globals=globals(), number=2))
print(timeit.timeit("o1()", globals=globals(), number=2))
print(timeit.timeit("o2()", globals=globals(), number=2))

prints

0.061956800000189105
0.012765400000716909
2.220097600000372

Upvotes: 0

orlp
orlp

Reputation: 117781

If the minimal/maximal elements of a, b form a small range (or at least small enough to fit into RAM), this can be done very quickly using a lookup table:

a = np.array([[51, 30, 20, 10],
              [10, 32, 65, 77],
              [15, 20, 77, 30]])
b = np.array([10, 15, 20, 30, 32, 51, 65, 77])

lo = min(a.min(), b.min())
hi = max(a.max(), b.max())
lut = np.zeros(hi - lo + 1, dtype=np.int64)
lut[b - lo] = np.arange(len(b))

Then:

>>> a_indices = lut[a - lo]
>>> a_indices
array([[5, 3, 2, 0],
       [0, 4, 6, 7],
       [1, 2, 7, 3]])

Upvotes: 1

Related Questions