Reputation: 11
I have an ndarray
A(an image in 2D for example) for which values are integers going from 0 to N.
I have another list B or array containing a list of numbers that are in the range of 0 to N.
I want to compare the first array to every element of the second list in order to obtain a new ndarray
indicating if the value of the pixel
is in the list.
A is around 10000 * 10000
B is a list having 10000-100000 values.
N goes up to 500 000
Here is an example of the results I wish to obtain.
I already tried for loops, it works but it's really slow as I have really big matrices. I also tried to do it with .any()
and numpy's
compare function but did not managed to obtain the desired result.
a = np.array([2, 23, 15, 0, 7, 5, 3])
b = np.array([3,7,17])
c = np.array([False, False, False, False, True, False, True])
Upvotes: 1
Views: 57
Reputation: 36309
You can reshape the array a
to have an extra dimension which will be used for comparing with b
and then use np.any
along that dimension:
>>> np.any(a[..., None] == b, axis=-1)
array([False, False, False, False, True, False, True])
This approach is flexible since it works with other element-wise comparison functions too. For example for two float arrays, instead of np.equal
we typically want to compare np.isclose
and we can do so by simply exchanging the comparison function:
>>> np.any(np.isclose(a[..., None], b), axis=-1)
If equality is however the criterion then np.isin
will perform better since it doesn't need to go through an intermediate broadcasted array of shape a.shape + (b.size,)
which will be reduced along the last axis anyway. That means it saves both in memory and compute since it doesn't need to allocate that array and neither perform all the computations:
In [2]: a = np.random.randint(0, 100, size=(100, 100))
In [3]: b = np.random.randint(0, 100, size=1000)
In [4]: %timeit np.any(a[..., None] == b, axis=-1)
12.1 ms ± 48.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [5]: %timeit np.isin(a, b)
608 µs ± 4.27 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Upvotes: 1
Reputation: 500733
You could use numpy.in1d
:
>>> np.in1d(a, b)
array([False, False, False, False, True, False, True], dtype=bool)
There's also numpy.isin
, which is recommended for new code.
Upvotes: 1