user11823711
user11823711

Reputation: 11

How to quickly compare every element of an ndarray with every element of a sorted list/array?

I have an ndarray A(an image in 2D for example) for which values are integers going from 0 to N.

I have another list B or array containing a list of numbers that are in the range of 0 to N.

I want to compare the first array to every element of the second list in order to obtain a new ndarray indicating if the value of the pixel is in the list.

A is around 10000 * 10000

B is a list having 10000-100000 values.

N goes up to 500 000

Here is an example of the results I wish to obtain.

I already tried for loops, it works but it's really slow as I have really big matrices. I also tried to do it with .any() and numpy's compare function but did not managed to obtain the desired result.

a = np.array([2, 23, 15, 0, 7, 5, 3])
b = np.array([3,7,17])
c = np.array([False, False, False, False, True, False, True])

Upvotes: 1

Views: 57

Answers (2)

a_guest
a_guest

Reputation: 36309

You can reshape the array a to have an extra dimension which will be used for comparing with b and then use np.any along that dimension:

>>> np.any(a[..., None] == b, axis=-1)
array([False, False, False, False,  True, False,  True])

This approach is flexible since it works with other element-wise comparison functions too. For example for two float arrays, instead of np.equal we typically want to compare np.isclose and we can do so by simply exchanging the comparison function:

>>> np.any(np.isclose(a[..., None], b), axis=-1)

If equality is however the criterion then np.isin will perform better since it doesn't need to go through an intermediate broadcasted array of shape a.shape + (b.size,) which will be reduced along the last axis anyway. That means it saves both in memory and compute since it doesn't need to allocate that array and neither perform all the computations:

In [2]: a = np.random.randint(0, 100, size=(100, 100))

In [3]: b = np.random.randint(0, 100, size=1000)

In [4]: %timeit np.any(a[..., None] == b, axis=-1)
12.1 ms ± 48.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [5]: %timeit np.isin(a, b)
608 µs ± 4.27 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Upvotes: 1

NPE
NPE

Reputation: 500733

You could use numpy.in1d:

>>> np.in1d(a, b)
array([False, False, False, False,  True, False,  True], dtype=bool)

There's also numpy.isin, which is recommended for new code.

Upvotes: 1

Related Questions