Reputation: 1

Partition and sorting of a numpy array

>> arr = [10, 11, 4, 3, 5, 7, 9, 2, 13]
>> np.partition(np.array(arr), -3)

array([ 9,  5,  4,  3,  2,  7, 10, 11, 13])

>> np.sort(np.partition(np.array(arr), -3)[-4:])

array([ 7, 10, 11, 13])

>> np.argpartition(np.array(arr), -3)

array([6, 4, 2, 3, 7, 5, 0, 1, 8], dtype=int64)

>> np.sort(np.argpartition(np.array(arr), -3)[-4:])

array([0, 1, 5, 8], dtype=int64)

what is going on in this code? Actually, I have gone through the documentation but could not understand this numerically.

Upvotes: 0

Answers (1)

Valdi_Bo

Reputation: 30991

It is a bad practice that you named a plain, pythonic list as arr. For now this is just a list and arrays will be created further on.

To better comprehend what is going on, it is advisable to divide the code into steps and save each partial result under separate variables. This is how I have rewritten your code.

So let's start from:

lst = [10, 11, 4, 3, 5, 7, 9, 2, 13]

The second step is to create an array from this list:

arr1 = np.array(lst)

I decided to name this (and following arrays) as "arr" with consecutive numbers.

The third step is to partition arr1, placing the "threshold" element at the third position from the end:

arr2 = np.partition(arr1, -3)

The result is:

array([ 9,  5,  4,  3,  2,  7, 10, 11, 13])

Details:

The "threshold" element (10) is located at the third position from the end.
All preceding elements are smaller than the threshold.
All following elements are greater or equal to the threshold.
Nothing can be said about the order of elements both before and after the "threshold" element.

Then you want to get last 4 elements of arr2:

arr3 = arr2[-4:]

No surprise, the result is:

array([ 7, 10, 11, 13])

The next step is to sort them:

arr4 = np.sort(arr3)

This time nothing has changed, the content of arr4 is just the same as arr3.

So far you finished your experiments with partition, the second part is to experiment with argpartition:

arr5 = np.argpartition(arr1, -3)

The result is:

array([6, 4, 2, 3, 7, 5, 0, 1, 8], dtype=int64)

It is an array of indices to arr1.

Details:

The third element from the end (0) is the index of the "threshold" element in arr1 (its value is 10).
All previous elements are indices of smaller elements (in arr1).
All following elements are indices of greater or equal elements (in arr1).

Then you get last 4 elements of arr5:

arr6 = arr5[-4:]

getting:

array([5, 0, 1, 8], dtype=int64)

And the last step is to sort them:

arr7 = np.sort(arr6)

getting (no surprise):

array([0, 1, 5, 8], dtype=int64)

That's all.

Upvotes: 1

Partition and sorting of a numpy array

Answers (1)

Related Questions