user2056257
user2056257

Reputation: 141

Numpy search for elements of an array in a subset

Suppose I have numpy arrays

a = np.array([1,3,5,7,9,11,13])
b = np.array([3,5,7,11,13])

and I want to create a boolean array of the size of a where each entry is True or False depending on whether the element of a is also in b.

So in this case, I want

a_b = np.array([False,True,True,True,False,True,True]). 

I can do this when b consists of one element as a == b[0]. Is there a quick way to do this when b has length greater than 1.

Upvotes: 2

Views: 1484

Answers (1)

ely
ely

Reputation: 77424

Use numpy.in1d:

In [672]: np.in1d([1,2,3,4], [1,2])
Out[672]: array([ True,  True, False, False], dtype=bool)

For your data:

In [674]: np.in1d(a, b)
Out[674]: array([False,  True,  True,  True, False,  True,  True], dtype=bool)

This is available in version 1.4.0 or later according to the docs. The docs also describe how the operation might look in pure Python:

in1d can be considered as an element-wise function version of the python keyword in, for 1-D sequences. in1d(a, b) is roughly equivalent to np.array([item in b for item in a]).

The docs for this function are worthwhile to read as there is the invert keyword argument and the assume_unique keyword argument -- each of which can be quite useful in some situations.

I also found it interesting to create my own version using np.vectorize and operator.contains:

from operator import contains
v_in = np.vectorize(lambda x,y: contains(y, x), excluded={1,})

and then:

In [696]: v_in([1,2,3, 2], [1, 2])
Out[696]: array([ True,  True, False,  True], dtype=bool)

Because operator.contains flips the arguments, I needed the lambda to make the calling convention match your use case -- but you could skip this if it was okay to call with b first then a.

But note that you need to use the excluded option for vectorize since you want whichever argument represents the b sequence (the sequence to check for membership within) to actually remain as a sequence (so if you chose not to flip the contains arguments with the lambda then you would want to exclude index 0 not 1).

The way with in1d will surely be much faster and is a much better way since it relies on a well-known built-in. But it's good to know how to do these tricks with operator and vectorize sometimes.

You could even create a Python Infix recipe instance for this and then use v_in as an "infix" operation:

v_in = Infix(np.vectorize(lambda x,y: contains(y, x), excluded={1,}))
# even easier: v_in = Infix(np.in1d)

and example usage:

In [702]: v_in([1, 2, 3, 2], [1, 2])
Out[702]: array([ True,  True, False,  True], dtype=bool)

In [704]: [1, 2, 3, 2] <<v_in>> [1, 2]
Out[704]: array([ True,  True, False,  True], dtype=bool)

Upvotes: 5

Related Questions