Reputation: 313
I want to create a 'mask' index array for an array, based on whether the elements of that array are members of some set. What I want can be achieved as follows:
x = np.arange(20)
interesting_numbers = {1, 5, 7, 17, 18}
x_mask = np.array([xi in interesting_numbers for xi in x])
I'm wondering if there's a faster way to execute that last line. As it is, it builds a list in Python by repeatedly calling a __contains__
method, then converts that list to a numpy array.
I want something like x_mask = x[x in interesting_numbers]
but that's not valid syntax.
Upvotes: 1
Views: 629
Reputation: 221574
Here's one approach with np.searchsorted
-
def set_membership(x, interesting_numbers):
b = np.sort(list(interesting_numbers))
idx = np.searchsorted(b, x)
idx[idx==b.size] = 0
return b[idx] == x
Runtime test -
# Setup inputs with random numbers that are not necessarily sorted
In [353]: x = np.random.choice(100000, 10000, replace=0)
In [354]: interesting_numbers = set(np.random.choice(100000, 1000, replace=0))
In [355]: x_mask = np.array([xi in interesting_numbers for xi in x])
# Verify output with set_membership
In [356]: np.allclose(x_mask, set_membership(x, interesting_numbers))
Out[356]: True
# @Psidom's solution
In [357]: %timeit np.in1d(x, list(interesting_numbers))
1000 loops, best of 3: 1.04 ms per loop
In [358]: %timeit set_membership(x, interesting_numbers)
1000 loops, best of 3: 682 µs per loop
Upvotes: 1
Reputation: 214967
You can use np.in1d
:
np.in1d(x, list(interesting_numbers))
#array([False, True, False, False, False, True, False, True, False,
# False, False, False, False, False, False, False, False, True,
# True, False], dtype=bool)
Timing, it is faster if the array x
is large:
x = np.arange(10000)
interesting_numbers = {1, 5, 7, 17, 18}
%timeit np.in1d(x, list(interesting_numbers))
# 10000 loops, best of 3: 41.1 µs per loop
%timeit x_mask = np.array([xi in interesting_numbers for xi in x])
# 1000 loops, best of 3: 1.44 ms per loop
Upvotes: 3