user2831683
user2831683

Reputation: 987

numpy.partition() with 2-D Array

numpy.partition() also does sorting the internal of elements of the array.

I have been trying to do simple sorting based on first element of all the elements of array.

import numpy as np
a = np.array([[5.2, 4.3], [200.2, 6.2], [1.4, 112.2]])
np.partition(a, (1,a.shape[1]-1), axis = 1)

Output:

array([[   4.3,    5.2],
       [   6.2,  200.2],
       [   1.4,  112.2]])

I don't understand the working of np.partition() here. Any resources for detail on numpy.partition()?

Specifically, I want to modify the arguments of the method to generate the following output:

array([[   1.4,   112.2],
       [    5.2,    4.3],
       [    200.2,  6.2]])

Upvotes: 2

Views: 3203

Answers (2)

ali_m
ali_m

Reputation: 74262

If I understand correctly, you just want to sort the rows in your array according to the values in the first column. You can do this using np.argsort:

# get an array of indices that will sort the first column in ascending order
order = np.argsort(a[:, 0])

# index into the row dimension of a
a_sorted = a[order]

print(a_sorted)
# [[   1.4  112.2]
#  [   5.2    4.3]
#  [ 200.2    6.2]]

If you want a partial sort rather than a full sort, you could use np.argpartition in much the same way:

# a slightly larger example array in order to better illustrate what
# argpartition does
b = np.array([[  5.2,   4.3],
              [200.2,   6.2],
              [  3.6,  85.1],
              [  1.4, 112.2],
              [ 12.8,  60.0],
              [  7.6,  23.4]])

# get a set of indices to reorder the rows of `b` such that b[2, 0] is in its
# final 'sorted' position, and all elements smaller or larger than it will be
# placed before and after it respectively
partial_order = np.argpartition(b[:, 0], 2)

# the first (2+1) elements in the first column are guaranteed to be smaller than
# the rest, but apart from that the order is arbitrary
print(b[partial_order])
# [[   1.4  112.2]
#  [   3.6   85.1]
#  [   5.2    4.3]
#  [ 200.2    6.2]
#  [  12.8   60. ]
#  [   7.6   23.4]]

Upvotes: 2

Alex Riley
Alex Riley

Reputation: 177088

np.partition() ensures that values at particular indices are the same as they would be if the array were to be fully sorted (e.g. with np.sort). (The order of the values at the other indices is not guaranteed to be anything meaningful.)

The axis=1 argument means that this operation will be applied individually to each row.

Here, the indices you've passed are (1, a.shape[1]-1) which is equivalent to (1, 1) in this case. Repeating an index has no special meaning, so on each row, the value in the second column (index 1) will be the same as if each row was in sorted order.

Now, when the operation is applied, you see in the returned array that the higher values in the first and second rows have been moved to this second column. The third row was already in its sorted order and so is unchanged.

This is really all there is to the function: the NumPy documentation covers a few further details. If you're feeling particularly brave, you can find the source code implementing the introselect algorithm used by np.partition() in all its glory here.

Upvotes: 3

Related Questions