Code Pope
Code Pope

Reputation: 5459

Numpy: get array where index greater than value and condition is true

I have the following array:

a = np.array([6,5,4,3,4,5,6])

Now I want to get all elements which are greater than 4 but also have in index value greater than 2. The way that I have found to do that was the following:

a[2:][a[2:]>4]

Is there a better or more readable way to accomplish this?

UPDATE: This is a simplified version. In reality the indexing is done with arithmetic operation over several variables like this:

a[len(trainPredict)+(look_back*2)+1:][a[len(trainPredict)+(look_back*2)+1:]>4]

trainPredict ist a numpy array, look_back an integer.
I wanted to see if there is an established way or how others do that.

Upvotes: 3

Views: 2645

Answers (2)

Mad Physicist
Mad Physicist

Reputation: 114578

@AlexanderCécile's answer is not only more legible than the one liner you posted, but is also removes the redundant computation of a temp array. Despite that, it does not appear to be any faster than your original approach.

The timings below are all run with a preliminary setup of

import numpy as np
np.random.seed(0xDEADBEEF)
a = np.random.randint(8, size=N)

N varies from 1e3 to 1e8 in factors of 10. I tried four variants of the code:

  1. CodePope: result = a[2:][a[2:] > 4]
  2. AlexanderCécile: s = a[2:]; result = s[s > 4]
  3. MadPhysicist1: result = a[np.flatnonzero(a[2:]) + 2]
  4. MadPhysicist2: result = a[(a > 4) & (np.arange(a.size) >= 2)]

In all cases, the timing was obtained on the command line by running

python -m timeit -s 'import numpy as np; np.random.seed(0xDEADBEEF); a = np.random.randint(8, size=N)' '<X>'

Here, N was a power of 10 between 3 and 8, and <X> one of the expressions above. Timings are as follows:

enter image description here

Methods #1 and #2 are virtually indistinguishable. What is surprising is that in the range between ~5e3 and ~1e6 elements, method #3 seems to be slightly, but noticeably faster. I would not normally expect that from fancy indexing. Method #4 is of course going to be the slowest.

Here is the data, for completeness:

           CodePope  AlexanderCécile  MadPhysicist1  MadPhysicist2
1000       3.77e-06         3.69e-06       5.48e-06       6.52e-06
10000       4.6e-05         4.59e-05       3.97e-05       5.93e-05
100000     0.000484         0.000483         0.0004       0.000592
1000000     0.00513          0.00515        0.00503        0.00675
10000000     0.0529           0.0525         0.0617          0.102
100000000     0.657            0.658          0.782           1.09

Upvotes: 1

AMC
AMC

Reputation: 2702

If you're worried about the complexity of the slice and/or the number of conditions, you can always separate them:

a = np.array([6,5,4,3,4,5,6])

a_slice = a[2:]

cond_1 = a_slice > 4

res = a_slice[cond_1]

Is your example very simplified? There might be better solutions for more complex manipulations.

Upvotes: 2

Related Questions