arwright3
arwright3

Reputation: 391

Find where a NumPy array is equal to any value in a list of values

I have an array of integers and want to find where that array is equal to any value in a list of multiple values.

This can easily be done by treating each value individually, or by using multiple "or" statements in a loop, but I feel like there must be a better/faster way to do it. I'm actually dealing with arrays of size 4000 x 2000, but here is a simplified edition of the problem:

fake = arange(9).reshape((3,3))

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

want = (fake==0) + (fake==2) + (fake==6) + (fake==8)

print want 

array([[ True, False,  True],
       [False, False, False],
       [ True, False,  True]], dtype=bool)

What I would like is a way to get want from a single command involving fake and the list of values [0, 2, 6, 8].

I'm assuming there is a package that has this included already that would be significantly faster than if I just wrote a function with a loop in Python.

Upvotes: 21

Views: 20610

Answers (3)

jpp
jpp

Reputation: 164683

NumPy 0.13+

As of NumPy v0.13, you can use np.isin, which works on multi-dimensional arrays:

>>> element = 2*np.arange(4).reshape((2, 2))
>>> element
array([[0, 2],
       [4, 6]])
>>> test_elements = [1, 2, 4, 8]
>>> mask = np.isin(element, test_elements)
>>> mask
array([[ False,  True],
       [ True,  False]])

NumPy pre-0.13

The accepted answer with np.in1d works only with 1d arrays and requires reshaping for the desired result. This is good for versions of NumPy before v0.13.

Upvotes: 17

Bas Swinckels
Bas Swinckels

Reputation: 18488

The function numpy.in1d seems to do what you want. The only problems is that it only works on 1d arrays, so you should use it like this:

In [9]: np.in1d(fake, [0,2,6,8]).reshape(fake.shape)
Out[9]: 
array([[ True, False,  True],
       [False, False, False],
       [ True, False,  True]], dtype=bool)

I have no clue why this is limited to 1d arrays only. Looking at its source code, it first seems to flatten the two arrays, after which it does some clever sorting tricks. But nothing would stop it from unflattening the result at the end again, like I had to do by hand here.

Upvotes: 21

shx2
shx2

Reputation: 64318

@Bas's answer is the one you're probably looking for. But here's another way to do it, using numpy's vectorize trick:

import numpy as np
S = set([0,2,6,8])

@np.vectorize
def contained(x):
    return x in S

contained(fake)
=> array([[ True, False,  True],
          [False, False, False],
          [ True, False,  True]], dtype=bool)

The con of this solution is that contained() is called for each element (i.e. in python-space), which makes this much slower than a pure-numpy solution.

Upvotes: 5

Related Questions