farhawa
farhawa

Reputation: 10398

Finding indices of matches of one array in another array

I have two numpy arrays, A and B. A conatains unique values and B is a sub-array of A. Now I am looking for a way to get the index of B's values within A.

For example:

A = np.array([1,2,3,4,5,6,7,8,9,10])
B = np.array([1,7,10])
# I need a function fun() that:
fun(A,B)
>> 0,6,9

Upvotes: 38

Views: 28921

Answers (4)

Mohammadreza Riahi
Mohammadreza Riahi

Reputation: 602

I had the same question these days. However, the timing performance is very critical for me. Therefore, I guess the timing comparison of different solutions may be useful for others.

As Divakar mentioned, you can use np.in1d(A, B) with np.where, np.nonzero. Moreover, you can use the np.in1d(A, B) with np.intersect1d (based on this page). Also, you can use np.searchsorted as another useful approach for sorted arrays.

I want to add another simple solution. You can use the comprehension list. It may take longer that the previous ones. However, if you take the advantage of Numba python package, it is much less time-consuming.

In [1]: import numpy as np
In [2]: from numba import njit

In [3]: a = np.array([1,2,3,4,5,6,7,8,9,10])
In [4]: b = np.array([1,7,10])

In [5]: np.where(np.in1d(a, b))[0]
   ...: array([0, 6, 9])

In [6]: np.nonzero(np.in1d(a, b))[0]
   ...: array([0, 6, 9])   

In [7]: np.searchsorted(a, b)
   ...: array([0, 6, 9])


In [8]: np.searchsorted(a, np.intersect1d(a, b))
   ...: array([0, 6, 9])

In [9]: [i for i, x in enumerate(a) if x in b]
   ...: [0, 6, 9]

In [10]: @njit
    ...: def func(a, b):
    ...:     return [i for i, x in enumerate(a) if x in b]

In [11]: func(a, b)
   ...: [0, 6, 9]

Now, let's compare the timing performance of these solutions.

In [12]: %timeit np.where(np.in1d(a, b))[0]
4.26 µs ± 6.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [13]: %timeit np.nonzero(np.in1d(a, b))[0]
4.39 µs ± 14.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [14]: %timeit np.searchsorted(a, b)
800 ns ± 6.04 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [15]: %timeit np.searchsorted(a, np.intersect1d(a, b))
8.8 µs ± 73.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [16]: %timeit [i for i, x in enumerate(a) if x in b]
15.4 µs ± 18.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [17]: %timeit func(a, b)
336 ns ± 0.579 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

Upvotes: 0

Paul Panzer
Paul Panzer

Reputation: 53029

Just for completeness: If the values in A are non negative and reasonably small:

lookup = np.empty((np.max(A) + 1), dtype=int)
lookup[A] = np.arange(len(A))
indices  = lookup[B]

Upvotes: 2

Bi Rico
Bi Rico

Reputation: 25813

Have you tried searchsorted?

A = np.array([1,2,3,4,5,6,7,8,9,10])
B = np.array([1,7,10])

A.searchsorted(B)
# array([0, 6, 9])

Upvotes: 11

Divakar
Divakar

Reputation: 221504

You can use np.in1d with np.nonzero -

np.nonzero(np.in1d(A,B))[0]

You can also use np.searchsorted, if you care about maintaining the order -

np.searchsorted(A,B)

For a generic case, when A & B are unsorted arrays, you can bring in the sorter option in np.searchsorted, like so -

sort_idx = A.argsort()
out = sort_idx[np.searchsorted(A,B,sorter = sort_idx)]

I would add in my favorite broadcasting too in the mix to solve a generic case -

np.nonzero(B[:,None] == A)[1]

Sample run -

In [125]: A
Out[125]: array([ 7,  5,  1,  6, 10,  9,  8])

In [126]: B
Out[126]: array([ 1, 10,  7])

In [127]: sort_idx = A.argsort()

In [128]: sort_idx[np.searchsorted(A,B,sorter = sort_idx)]
Out[128]: array([2, 4, 0])

In [129]: np.nonzero(B[:,None] == A)[1]
Out[129]: array([2, 4, 0])

Upvotes: 46

Related Questions