Reputation: 119
My question is very similar to this one: How to get indices of N maximum values in a numpy array?
But I would like to get the indices in the same order I find them.
Let's take the example marked in that question as the correct solution:
import numpy as np
arr = np.array([1, 3, 2, 4, 5])
arr.argsort()[-3:][::-1]
array([4, 3, 1])
The result I'm looking for instead should be like:
array([1, 3, 4])
Upvotes: 2
Views: 7269
Reputation: 8378
Use numpy.argpartition()
:
k = 3
np.argpartition(arr, len(arr) - k)[-k:]
Adjust k
index to whatever you need.
NOTE: returned indices are not guaranteed to be in the "sorted order" - just that anything past index k
is larger than the value at position k
in the sorted array.
NOTE 2: If you do need the returned indices to be sorted themselves, then simply add a numpy.sort()
to the above command:
np.sort(np.argpartition(arr, len(arr) - k)[-k:])
numpy.argpartition()
provides significant performance gains over full sort
especially for large arr
. In the above example you do a full sort only over the selected indices (not all).
Upvotes: 4
Reputation: 53029
It probably depends a bit on the sizes of a
and k
but often the fastest appears to be combining partition
with flatnonzero
or where
:
>>> a = np.random.random(10000)
>>> k = 5
>>>
>>> timeit("np.flatnonzero(a >= np.partition(a, len(a) - k)[len(a) - k])", globals=globals(), number=10000)
0.8328661819687113
>>> timeit("np.sort(np.argpartition(a, len(a) - k)[len(a) - k:])", globals=globals(), number=10000)
1.0577796879806556
>>> np.flatnonzero(a >= np.partition(a, len(a) - k)[len(a) - k])
array([2527, 4299, 5531, 6945, 7174])
>>> np.sort(np.argpartition(a, len(a) - k)[len(a) - k:])
array([2527, 4299, 5531, 6945, 7174])
Note 1: this highlights the significant performance cost of indirect indexing.
Note 2: as we only use the pivot element and discard the actual partition percentile
should in theory be at least as fast but in practice it is way slower.
Upvotes: 2