Reputation: 8526
I have two numpy arrays, x
and y
(the length are around 2M). The x
are ordered, but some of the values are identical.
The task is to remove values for both x
and y
when the values in x
are identical. My idea is to create a mask. Here is what I have done so far:
def createMask(x):
idx = np.empty(x.shape, dtype=bool)
for i in xrange(len(x)-1):
if x[i+1] == x[i]:
idx[i] = False
return idx
idx = createMask(x)
x = x[idx]
y = y[idx]
This method works fine, but it is slow (705ms with %timeit
). Also I think this look really clumpsy. Is there are more elegant and efficient way (I'm sure there is).
Updated with best answer
The second method is
idx = [x[i+1] == x[i] for i in xrange(len(x)-1)]
And the third (and fastest) method is
idx = x[:-1] == x[1:]
The results are (using ipython's %timeit
):
First method: 751ms
Second method: 618ms
Third method: 3.63ms
Credit to mtitan8 for both methods.
Upvotes: 2
Views: 1542
Reputation: 22882
I believe the fastest method is to compare x
using numpy's ==
array operator:
idx = x[:-1] == x[1:]
On my machine, using x
with a million random integers in [0, 100],
In[15]: timeit idx = x[:-1] == x[1:]
1000 loops, best of 3: 1 ms per loop
Upvotes: 4