Reputation: 5459
I have the following array:
a = np.array([6,5,4,3,4,5,6])
Now I want to get all elements which are greater than 4 but also have in index value greater than 2. The way that I have found to do that was the following:
a[2:][a[2:]>4]
Is there a better or more readable way to accomplish this?
UPDATE: This is a simplified version. In reality the indexing is done with arithmetic operation over several variables like this:
a[len(trainPredict)+(look_back*2)+1:][a[len(trainPredict)+(look_back*2)+1:]>4]
trainPredict
ist a numpy array, look_back
an integer.
I wanted to see if there is an established way or how others do that.
Upvotes: 3
Views: 2645
Reputation: 114578
@AlexanderCécile's answer is not only more legible than the one liner you posted, but is also removes the redundant computation of a temp array. Despite that, it does not appear to be any faster than your original approach.
The timings below are all run with a preliminary setup of
import numpy as np
np.random.seed(0xDEADBEEF)
a = np.random.randint(8, size=N)
N
varies from 1e3 to 1e8 in factors of 10. I tried four variants of the code:
result = a[2:][a[2:] > 4]
s = a[2:]; result = s[s > 4]
result = a[np.flatnonzero(a[2:]) + 2]
result = a[(a > 4) & (np.arange(a.size) >= 2)]
In all cases, the timing was obtained on the command line by running
python -m timeit -s 'import numpy as np; np.random.seed(0xDEADBEEF); a = np.random.randint(8, size=N)' '<X>'
Here, N
was a power of 10 between 3 and 8, and <X>
one of the expressions above. Timings are as follows:
Methods #1 and #2 are virtually indistinguishable. What is surprising is that in the range between ~5e3 and ~1e6 elements, method #3 seems to be slightly, but noticeably faster. I would not normally expect that from fancy indexing. Method #4 is of course going to be the slowest.
Here is the data, for completeness:
CodePope AlexanderCécile MadPhysicist1 MadPhysicist2
1000 3.77e-06 3.69e-06 5.48e-06 6.52e-06
10000 4.6e-05 4.59e-05 3.97e-05 5.93e-05
100000 0.000484 0.000483 0.0004 0.000592
1000000 0.00513 0.00515 0.00503 0.00675
10000000 0.0529 0.0525 0.0617 0.102
100000000 0.657 0.658 0.782 1.09
Upvotes: 1
Reputation: 2702
If you're worried about the complexity of the slice and/or the number of conditions, you can always separate them:
a = np.array([6,5,4,3,4,5,6])
a_slice = a[2:]
cond_1 = a_slice > 4
res = a_slice[cond_1]
Is your example very simplified? There might be better solutions for more complex manipulations.
Upvotes: 2