Reputation: 5358
Say, one have a following numpy array:
X = numpy.array([1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5])
Now, how one can exclude from the array X
ranges X[0:2]
, X[6:8]
and X[12:14]
at once, so one will get in result X= [2, 2, 2, 4, 4, 4]
?
Upvotes: 3
Views: 3375
Reputation: 541
Not sure this is helpful but IF the output of each range is unique, you can index by range count.
X = numpy.array([1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5])
A = np.unique(X)
Out[79]: array([1, 2, 3, 4, 5])
Here we want to keep the second and fourth range so.
X = X[(X==A[1])|(X==A[3])]
Out[82]: array([2, 2, 2, 4, 4, 4])
Upvotes: 0
Reputation: 231385
In a comment to @unutbu
s answer I suggested np.delete
. Here are a few timings
A larger test array:
In [445]: A=np.arange(1000)
@unutbu's answer:
In [446]: timeit A[~np.in1d(np.arange(len(A)), (np.r_[10:50:3,100:200,300:350]))].shape
1000 loops, best of 3: 454 µs per loop
Same index list, but using np.delete
- about 3x speedup
In [447]: timeit np.delete(A,np.r_[10:50:3,100:200,300:350]).shape
10000 loops, best of 3: 166 µs per loop
But doing a straight forward boolean masking is even faster. Earlier I deduced that np.delete
does basically this, but it must have some added overhead (including the ability to handle multiple dimensions):
In [448]: %%timeit
ind=np.ones_like(A,bool)
ind[np.r_[10:50:3,100:200,300:350]]=False
A[ind].shape
.....:
10000 loops, best of 3: 71.5 µs per loop
np.delete
has a different strategy when the input is a slice, which may be faster than boolean indexing. But it only handles one slice at a time, hence the nested delete that @Kasramvd shows. I intend to add that timing.
Concatenating multiple slices is another option.
np.r_
also involves a loop, but it is only over the slices. Basically it iterates over the slices, expanding each as a range, and concatenates them. In my fastest case it is responsible for 2/3 of the run time:
In [451]: timeit np.r_[10:50:3,100:200,300:350]
10000 loops, best of 3: 41 µs per loop
In [453]: %%timeit x=np.r_[10:50:3,100:200,300:350]
ind=np.ones_like(A,bool)
ind[x]=False
A[ind].shape
.....:
10000 loops, best of 3: 24.2 µs per loop
The nested delete has pretty good performance:
In [457]: timeit np.delete( np.delete( np.delete(A,slice(300,350)),
slice(100,200)),slice(10,50,3)).shape
10000 loops, best of 3: 108 µs per loop
np.delete
, when given a slice to delete, copies slices to the result array (the blocks before and after the delete block). I can approximate that by concatenating several slices. I'm cheating here by using delete for the 1st block, rather than take the time to write a pure copy. Still it is faster than the best boolean mask expression.
In [460]: timeit np.concatenate([np.delete(A[:100],slice(10,50,3)),
A[200:300],A[350:]]).shape
10000 loops, best of 3: 65.7 µs per loop
I can remove the delete
with this slicing, though the order of the 10:50 range is messed up. I suspect that this is, theoretically, the fastest:
In [480]: timeit np.concatenate([A[:10], A[11:50:3], A[12:50:3],
A[50:100], A[200:300], A[350:]]).shape
100000 loops, best of 3: 16.1 µs per loop
An important caution - these alternatives are being tested with non-overlapping slices. Some may work with overlaps, others might not.
Upvotes: 1
Reputation: 6633
Just compose X based on the intervals you want to keep..
X = np.array(list(X[3:6]) + list(X[9:12]))
Upvotes: 0
Reputation: 879421
You could use np.r_
to combine the ranges into a 1D array:
In [18]: np.r_[0:2,6:8,12:14]
Out[18]: array([ 0, 1, 6, 7, 12, 13])
Then use np.in1d
to create a boolean array which is True at those index locations:
In [19]: np.in1d(np.arange(len(X)), (np.r_[0:2,6:8,12:14]))
Out[19]:
array([ True, True, False, False, False, False, True, True, False,
False, False, False, True, True, False], dtype=bool)
And then use~
to invert the boolean array:
In [11]: X = np.array([1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5])
In [12]: X[~np.in1d(np.arange(len(X)), (np.r_[0:2,6:8,12:14]))]
Out[12]: array([1, 2, 2, 2, 3, 4, 4, 4, 5])
Note that X[12:14]
captures only the first two 5's. There is one 5 left over, so the result is array([1, 2, 2, 2, 3, 4, 4, 4, 5])
, not array([1, 2, 2, 2, 3, 4, 4, 4])
.
Slice ranges in Python are half-open intervals. The left index is included, but the right index is not. So X[12:14]
selects X[12]
and X[13]
, but not X[14]
. See this post for Guido van Rossum's explanation for why Python uses half-open intervals.
To get the result [2, 2, 2, 4, 4, 4]
you would need to add one to the right-hand (ending) index for each slice:
In [17]: X[~np.in1d(np.arange(len(X)), (np.r_[0:3,6:9,12:15]))]
Out[17]: array([2, 2, 2, 4, 4, 4])
Upvotes: 4
Reputation: 107287
You can call the np.delete
3 time and since @nneonneo said in comment do it reverse which doesn't need to calculate range offsets. :
>>> np.delete(np.delete(np.delete(X,np.s_[12:14]),np.s_[6:8]),np.s_[0:2])
array([1, 2, 2, 2, 3, 4, 4, 4, 5])
Upvotes: 0
Reputation: 3551
You can use something like this:
numbers = [1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5]
exclude = set(range(0,2) + range(6,8) + range(12,14))
[n for n in numbers if n not in exclude]
or:
[i for i in nums if i not in xrange(0,2) and i not in xrange(6,8) and i not in xrange(12,14)]
result:
[2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5]
Upvotes: 1