Reputation: 141
Suppose I have numpy arrays
a = np.array([1,3,5,7,9,11,13])
b = np.array([3,5,7,11,13])
and I want to create a boolean array of the size of a where each entry is True or False depending on whether the element of a is also in b.
So in this case, I want
a_b = np.array([False,True,True,True,False,True,True]).
I can do this when b consists of one element as a == b[0]. Is there a quick way to do this when b has length greater than 1.
Upvotes: 2
Views: 1484
Reputation: 77424
Use numpy.in1d
:
In [672]: np.in1d([1,2,3,4], [1,2])
Out[672]: array([ True, True, False, False], dtype=bool)
For your data:
In [674]: np.in1d(a, b)
Out[674]: array([False, True, True, True, False, True, True], dtype=bool)
This is available in version 1.4.0 or later according to the docs. The docs also describe how the operation might look in pure Python:
in1d
can be considered as an element-wise function version of the python keywordin
, for 1-D sequences.in1d(a, b)
is roughly equivalent tonp.array([item in b for item in a])
.
The docs for this function are worthwhile to read as there is the invert
keyword argument and the assume_unique
keyword argument -- each of which can be quite useful in some situations.
I also found it interesting to create my own version using np.vectorize
and operator.contains
:
from operator import contains
v_in = np.vectorize(lambda x,y: contains(y, x), excluded={1,})
and then:
In [696]: v_in([1,2,3, 2], [1, 2])
Out[696]: array([ True, True, False, True], dtype=bool)
Because operator.contains
flips the arguments, I needed the lambda
to make the calling convention match your use case -- but you could skip this if it was okay to call with b
first then a
.
But note that you need to use the excluded
option for vectorize
since you want whichever argument represents the b
sequence (the sequence to check for membership within) to actually remain as a sequence (so if you chose not to flip the contains
arguments with the lambda
then you would want to exclude index 0
not 1
).
The way with in1d
will surely be much faster and is a much better way since it relies on a well-known built-in. But it's good to know how to do these tricks with operator
and vectorize
sometimes.
You could even create a Python Infix recipe instance for this and then use v_in
as an "infix" operation:
v_in = Infix(np.vectorize(lambda x,y: contains(y, x), excluded={1,}))
# even easier: v_in = Infix(np.in1d)
and example usage:
In [702]: v_in([1, 2, 3, 2], [1, 2])
Out[702]: array([ True, True, False, True], dtype=bool)
In [704]: [1, 2, 3, 2] <<v_in>> [1, 2]
Out[704]: array([ True, True, False, True], dtype=bool)
Upvotes: 5