Alex Punkallo
Alex Punkallo

Reputation: 119

Get indices of N maximum values in a numpy array without sorting them?

My question is very similar to this one: How to get indices of N maximum values in a numpy array?

But I would like to get the indices in the same order I find them.

Let's take the example marked in that question as the correct solution:

import numpy as np
arr = np.array([1, 3, 2, 4, 5])
arr.argsort()[-3:][::-1]

array([4, 3, 1])

The result I'm looking for instead should be like:

array([1, 3, 4])

Upvotes: 2

Views: 7269

Answers (2)

AGN Gazer
AGN Gazer

Reputation: 8378

Use numpy.argpartition():

k = 3
np.argpartition(arr, len(arr) - k)[-k:]

Adjust k index to whatever you need.

NOTE: returned indices are not guaranteed to be in the "sorted order" - just that anything past index k is larger than the value at position k in the sorted array.

NOTE 2: If you do need the returned indices to be sorted themselves, then simply add a numpy.sort() to the above command:

np.sort(np.argpartition(arr, len(arr) - k)[-k:])

numpy.argpartition() provides significant performance gains over full sort especially for large arr. In the above example you do a full sort only over the selected indices (not all).

Upvotes: 4

Paul Panzer
Paul Panzer

Reputation: 53029

It probably depends a bit on the sizes of a and k but often the fastest appears to be combining partition with flatnonzero or where:

>>> a = np.random.random(10000)
>>> k = 5
>>> 
>>> timeit("np.flatnonzero(a >= np.partition(a, len(a) - k)[len(a) - k])", globals=globals(), number=10000)
0.8328661819687113
>>> timeit("np.sort(np.argpartition(a, len(a) - k)[len(a) - k:])", globals=globals(), number=10000)
1.0577796879806556
>>> np.flatnonzero(a >= np.partition(a, len(a) - k)[len(a) - k])
array([2527, 4299, 5531, 6945, 7174])
>>> np.sort(np.argpartition(a, len(a) - k)[len(a) - k:])
array([2527, 4299, 5531, 6945, 7174])

Note 1: this highlights the significant performance cost of indirect indexing.

Note 2: as we only use the pivot element and discard the actual partition percentile should in theory be at least as fast but in practice it is way slower.

Upvotes: 2

Related Questions