Reputation:
How do I sort a NumPy array by its nth column?
For example, given:
a = array([[9, 2, 3],
[4, 5, 6],
[7, 0, 5]])
I want to sort the rows of a
by the second column to obtain:
array([[7, 0, 5],
[9, 2, 3],
[4, 5, 6]])
Upvotes: 517
Views: 553773
Reputation: 333
As the Python documentation wiki suggests:
a = ([[1, 2, 3], [4, 5, 6], [0, 0, 1]]);
a = sorted(a, key=lambda a_entry: a_entry[1])
print a
Output:
[[[0, 0, 1], [1, 2, 3], [4, 5, 6]]]
Upvotes: 24
Reputation: 383
Simply using sort, use column number based on which you want to sort.
a = np.array([1,1], [1,-1], [-1,1], [-1,-1]])
print (a)
a = a.tolist()
a = np.array(sorted(a, key=lambda a_entry: a_entry[0]))
print (a)
Upvotes: 0
Reputation: 590
Pandas Approach Just For Completeness:
a = np.array([[9, 2, 3],
[4, 5, 6],
[7, 0, 5]])
a = pd.DataFrame(a)
a.sort_values(1, ascending=True).to_numpy()
array([[7, 0, 5], # '1' means sort by second column
[9, 2, 3],
[4, 5, 6]])
prl900 Did the Benchmark, comparing with the accepted answer:
%timeit pandas_df.sort_values(9, ascending=True)
1000 loops, best of 3: 400 µs per loop
%timeit numpy_table[numpy_table[:,9].argsort()]
10000 loops, best of 3: 180 µs per loop
Upvotes: 2
Reputation: 704
Thanks to this post: https://stackoverflow.com/a/5204280/13890678
I found a more "generic" answer using structured array. I think one advantage of this method is that the code is easier to read.
import numpy as np
a = np.array([[9, 2, 3],
[4, 5, 6],
[7, 0, 5]])
struct_a = np.core.records.fromarrays(
a.transpose(), names="col1, col2, col3", formats="i8, i8, i8"
)
struct_a.sort(order="col2")
print(struct_a)
[(7, 0, 5) (9, 2, 3) (4, 5, 6)]
Upvotes: 0
Reputation: 1
def sort_np_array(x, column=None, flip=False):
x = x[np.argsort(x[:, column])]
if flip:
x = np.flip(x, axis=0)
return x
Array in the original question:
a = np.array([[9, 2, 3],
[4, 5, 6],
[7, 0, 5]])
The result of the sort_np_array
function as expected by the author of the question:
sort_np_array(a, column=1, flip=False)
[2]: array([[7, 0, 5],
[9, 2, 3],
[4, 5, 6]])
Upvotes: 0
Reputation: 1
#for sorting along column 1
indexofsort=np.argsort(dataset[:,0],axis=-1,kind='stable')
dataset = dataset[indexofsort,:]
Upvotes: 0
Reputation: 343
import numpy as np
a=np.array([[21,20,19,18,17],[16,15,14,13,12],[11,10,9,8,7],[6,5,4,3,2]])
y=np.argsort(a[:,2],kind='mergesort')# a[:,2]=[19,14,9,4]
a=a[y]
print(a)
Desired output is [[6,5,4,3,2],[11,10,9,8,7],[16,15,14,13,12],[21,20,19,18,17]]
note that argsort(numArray)
returns the indices of an numArray
as it was supposed to be arranged in a sorted manner.
example
x=np.array([8,1,5])
z=np.argsort(x) #[1,3,0] are the **indices of the predicted sorted array**
print(x[z]) #boolean indexing which sorts the array on basis of indices saved in z
answer would be [1,5,8]
Upvotes: 4
Reputation: 284552
@steve's answer is actually the most elegant way of doing it.
For the "correct" way see the order keyword argument of numpy.ndarray.sort
However, you'll need to view your array as an array with fields (a structured array).
The "correct" way is quite ugly if you didn't initially define your array with fields...
As a quick example, to sort it and return a copy:
In [1]: import numpy as np
In [2]: a = np.array([[1,2,3],[4,5,6],[0,0,1]])
In [3]: np.sort(a.view('i8,i8,i8'), order=['f1'], axis=0).view(np.int)
Out[3]:
array([[0, 0, 1],
[1, 2, 3],
[4, 5, 6]])
To sort it in-place:
In [6]: a.view('i8,i8,i8').sort(order=['f1'], axis=0) #<-- returns None
In [7]: a
Out[7]:
array([[0, 0, 1],
[1, 2, 3],
[4, 5, 6]])
@Steve's really is the most elegant way to do it, as far as I know...
The only advantage to this method is that the "order" argument is a list of the fields to order the search by. For example, you can sort by the second column, then the third column, then the first column by supplying order=['f1','f2','f0'].
Upvotes: 182
Reputation: 12397
It is an old question but if you need to generalize this to a higher than 2 dimension arrays, here is the solution than can be easily generalized:
np.einsum('ij->ij', a[a[:,1].argsort(),:])
This is an overkill for two dimensions and a[a[:,1].argsort()]
would be enough per @steve's answer, however that answer cannot be generalized to higher dimensions. You can find an example of 3D array in this question.
Output:
[[7 0 5]
[9 2 3]
[4 5 6]]
Upvotes: 0
Reputation: 380
Here is another solution considering all columns (more compact way of J.J's answer);
ar=np.array([[0, 0, 0, 1],
[1, 0, 1, 0],
[0, 1, 0, 0],
[1, 0, 0, 1],
[0, 0, 1, 0],
[1, 1, 0, 0]])
Sort with lexsort,
ar[np.lexsort(([ar[:, i] for i in range(ar.shape[1]-1, -1, -1)]))]
Output:
array([[0, 0, 0, 1],
[0, 0, 1, 0],
[0, 1, 0, 0],
[1, 0, 0, 1],
[1, 0, 1, 0],
[1, 1, 0, 0]])
Upvotes: 1
Reputation: 1084
I had a similar problem.
My Problem:
I want to calculate an SVD and need to sort my eigenvalues in descending order. But I want to keep the mapping between eigenvalues and eigenvectors. My eigenvalues were in the first row and the corresponding eigenvector below it in the same column.
So I want to sort a two-dimensional array column-wise by the first row in descending order.
My Solution
a = a[::, a[0,].argsort()[::-1]]
So how does this work?
a[0,]
is just the first row I want to sort by.
Now I use argsort to get the order of indices.
I use [::-1]
because I need descending order.
Lastly I use a[::, ...]
to get a view with the columns in the right order.
Upvotes: 8
Reputation: 3249
From the NumPy mailing list, here's another solution:
>>> a
array([[1, 2],
[0, 0],
[1, 0],
[0, 2],
[2, 1],
[1, 0],
[1, 0],
[0, 0],
[1, 0],
[2, 2]])
>>> a[np.lexsort(np.fliplr(a).T)]
array([[0, 0],
[0, 0],
[0, 2],
[1, 0],
[1, 0],
[1, 0],
[1, 0],
[1, 2],
[2, 1],
[2, 2]])
Upvotes: 26
Reputation: 3607
You can sort on multiple columns as per Steve Tjoa's method by using a stable sort like mergesort and sorting the indices from the least significant to the most significant columns:
a = a[a[:,2].argsort()] # First sort doesn't need to be stable.
a = a[a[:,1].argsort(kind='mergesort')]
a = a[a[:,0].argsort(kind='mergesort')]
This sorts by column 0, then 1, then 2.
Upvotes: 62
Reputation: 231325
A little more complicated lexsort
example - descending on the 1st column, secondarily ascending on the 2nd. The tricks with lexsort
are that it sorts on rows (hence the .T
), and gives priority to the last.
In [120]: b=np.array([[1,2,1],[3,1,2],[1,1,3],[2,3,4],[3,2,5],[2,1,6]])
In [121]: b
Out[121]:
array([[1, 2, 1],
[3, 1, 2],
[1, 1, 3],
[2, 3, 4],
[3, 2, 5],
[2, 1, 6]])
In [122]: b[np.lexsort(([1,-1]*b[:,[1,0]]).T)]
Out[122]:
array([[3, 1, 2],
[3, 2, 5],
[2, 1, 6],
[2, 3, 4],
[1, 1, 3],
[1, 2, 1]])
Upvotes: 3
Reputation: 4179
In case someone wants to make use of sorting at a critical part of their programs here's a performance comparison for the different proposals:
import numpy as np
table = np.random.rand(5000, 10)
%timeit table.view('f8,f8,f8,f8,f8,f8,f8,f8,f8,f8').sort(order=['f9'], axis=0)
1000 loops, best of 3: 1.88 ms per loop
%timeit table[table[:,9].argsort()]
10000 loops, best of 3: 180 µs per loop
import pandas as pd
df = pd.DataFrame(table)
%timeit df.sort_values(9, ascending=True)
1000 loops, best of 3: 400 µs per loop
So, it looks like indexing with argsort is the quickest method so far...
Upvotes: 26