Reputation: 43
I have an n*m array "a", and another 1D array "b", such as the following:
a = array([[ 51, 30, 20, 10],
[ 10, 32, 65, 77],
[ 15, 20, 77, 30]])
b = array([10, 15, 20, 30, 32, 51, 65, 77])
I would like to replace all elements in "a" with the corresponding index of "b" where that element lies. In the case above, I would like the output to be:
a = array([[ 5, 3, 2, 0],
[ 0, 4, 6, 7],
[ 1, 2, 7, 3]])
Please note, in real application my arrays are large, over 30k elements and several thousands of them. I have tried for loops but these take a long time to compute. I have also tried similar iterative methods, and using list.index() to grab the indices but this also takes too much time.
Can anyone help me in identifying first the indices of "b" for the elements of "a" which appear in "b", and then constructing the updated "a" array?
Thank you.
Upvotes: 2
Views: 1691
Reputation: 56
This is posted as an answer only because it is too long for a comment. It supports orlp's solution posted above. Numpy's vectorize avoids an explicit loop, but it is clearly not the best approach. Note that Numpy's searchsorted can only be applied as shown when b is sorted.
import timeit
import numpy as np
a = np.random.randint(1,100,(1000,1000))
b = np.arange(0,1000,1)
def o1():
lo = min(a.min(), b.min())
hi = max(a.max(), b.max())
lut = np.zeros(hi - lo + 1, dtype=np.int64)
lut[b - lo] = np.arange(len(b))
a2 = lut[a - lo]
return a2
def o2():
a2 = a.copy()
fu = np.vectorize(lambda i: np.place(a2, a2==b[i], i))
fu(np.arange(0,len(b),1))
print(timeit.timeit("np.searchsorted(b, a)", globals=globals(), number=2))
print(timeit.timeit("o1()", globals=globals(), number=2))
print(timeit.timeit("o2()", globals=globals(), number=2))
prints
0.061956800000189105
0.012765400000716909
2.220097600000372
Upvotes: 0
Reputation: 117781
If the minimal/maximal elements of a, b
form a small range (or at least small enough to fit into RAM), this can be done very quickly using a lookup table:
a = np.array([[51, 30, 20, 10],
[10, 32, 65, 77],
[15, 20, 77, 30]])
b = np.array([10, 15, 20, 30, 32, 51, 65, 77])
lo = min(a.min(), b.min())
hi = max(a.max(), b.max())
lut = np.zeros(hi - lo + 1, dtype=np.int64)
lut[b - lo] = np.arange(len(b))
Then:
>>> a_indices = lut[a - lo]
>>> a_indices
array([[5, 3, 2, 0],
[0, 4, 6, 7],
[1, 2, 7, 3]])
Upvotes: 1