Reputation: 1107

How could I get numpy array indices by some conditions

I come to a problem like this: suppose I have arrays like this: a = np.array([[1,2,3,4,5,4,3,2,1],]) label = np.array([[1,0,1,0,0,1,1,0,1],]) I need to obtain the indices of a at which position the element value of label is 1 and the value of a is the largest amount all that causing label to be 1.

It maybe confusing, in the above example, the indices where label is 1 are: 0, 2, 5, 6, 8, their corresponding values of a are thus: 1, 3, 4, 3, 1, among which 4 is the larges, thus I need to get the result of 5 which is the index of number 4 in a. How could I do this with numpy ?

Upvotes: 3

Answers (3)

Divakar

Reputation: 221684

Get the 1s indices say as idx, then index into a with it, get max index and finally trace it back to the original order by indexing into idx -

idx = np.flatnonzero(label==1)
out = idx[a[idx].argmax()]

Sample run -

# Assuming inputs to be 1D
In [18]: a
Out[18]: array([1, 2, 3, 4, 5, 4, 3, 2, 1])

In [19]: label
Out[19]: array([1, 0, 1, 0, 0, 1, 1, 0, 1])

In [20]: idx = np.flatnonzero(label==1)

In [21]: idx[a[idx].argmax()]
Out[21]: 5

For a as ints and label as an array of 0s and 1s, we could optimize further as we could scale a based on the range of values in it, like so -

(label*(a.max()-a.min()+1) + a).argmax()

Furthermore, if a has positive numbers only, it would simplify to -

(label*(a.max()+1) + a).argmax()

Timings for positive ints largish a -

In [115]: np.random.seed(0)
     ...: a = np.random.randint(0,10,(100000))
     ...: label = np.random.randint(0,2,(100000))

In [117]: %%timeit
     ...: idx = np.flatnonzero(label==1)
     ...: out = idx[a[idx].argmax()]
1000 loops, best of 3: 592 µs per loop

In [116]: %timeit (label*(a.max()-a.min()+1) + a).argmax()
1000 loops, best of 3: 357 µs per loop

# @coldspeed's soln
In [120]: %timeit np.ma.masked_where(~label.astype(bool), a).argmax()
1000 loops, best of 3: 1.63 ms per loop

# won't work with negative numbers in a
In [119]: %timeit (label*(a.max()+1) + a).argmax()
1000 loops, best of 3: 292 µs per loop

# @klim's soln (won't work with negative numbers in a)
In [121]: %timeit np.argmax(a * (label == 1))
1000 loops, best of 3: 229 µs per loop

Upvotes: 3

klim

Reputation: 1269

Here is one of the simplest ways.

>>> np.argmax(a * (label == 1))
5
>>> np.argmax(a * (label == 1), axis=1)
array([5])

Coldspeed's method may take more time.

Upvotes: 1

cs95

Reputation: 403020

You can use masked arrays:

>>> np.ma.masked_where(~label.astype(bool), a).argmax()
5

Upvotes: 1

How could I get numpy array indices by some conditions

Answers (3)

Related Questions