Reputation: 3048
Suppose I have an M x N
numpy array where each row represents a data entry, the first N-1
columns represent different parameters (independent variable), and the last column represent the data I'm interested in (dependent variable).
What's the most elegant way to re-arrange different rows such that they are sorted by the parameters?
Example:
# original
1 0.1 20 0.30000000000000004 0.07819319717404902
1 1 10 0.2 0.07550707294415204
2 0.1 0 0 0.07078663749666488
2 0.1 0 0.1 0.07284943819285646
1 1 15 0.4 0.08047398714777267
1 1 15 0.5 0.0820402298018169
1 1 15 0.30000000000000004 0.07819319717406738
1 1 20 0 0.07079655446543297
1 1 20 0.1 0.07286704639139795
1 1 5 0.4 0.086521872154
# desired:
1 0.1 20 0.30000000000000004 0.07819319717404902
1 1 5 0.4 0.086521872154
1 1 10 0.2 0.07550707294415204
1 1 15 0.30000000000000004 0.07819319717406738
1 1 15 0.4 0.08047398714777267
1 1 15 0.5 0.0820402298018169
1 1 20 0 0.07079655446543297
1 1 20 0.1 0.07286704639139795
2 0.1 0 0 0.07078663749666488
2 0.1 0 0.1 0.07284943819285646
I want the data to be sorted from the smallest in each parameter.
Upvotes: 2
Views: 85
Reputation: 4487
Given the following matrix:
m = np.array([[5., 0.1, 3.4],
[7., 0.3, 6.8],
[3., 0.2, 5.6]])
This code sorts the matrix m based on column 0:
m[m[:,0].argsort(kind='mergesort')]
Result:
array([[3. , 0.2, 5.6],
[5. , 0.1, 3.4],
[7. , 0.3, 6.8]])
Given:
a = np.array([[1,20,200], [1,30,100], [1,10,300]])
array([[ 1, 20, 200],
[ 1, 30, 100],
[ 1, 10, 300]])
Order by column 1 and column 0:
a[np.lexsort((a[:,0],a[:,1]))]
# output:
array([[ 1, 10, 300],
[ 1, 20, 200],
[ 1, 30, 100]])
NOTE: The last right-column (or row if keys is a 2D array) is the primary sort key.
Order by all columns (starting from the right):
a[np.lexsort((a[:,0], a[:,1],a[:,2]))]
# output:
array([[ 1, 30, 100],
[ 1, 20, 200],
[ 1, 10, 300]])
Or equivalently, order by all column without specifying the columns manually (following the order of the columns in the matrix starting from the right):
a[np.lexsort(list(map(tuple,np.column_stack(a))))]
# output:
array([[ 1, 30, 100],
[ 1, 20, 200],
[ 1, 10, 300]])
Another option is to switch to pandas. It's works, but it is some order of magnitude slower. Here are some tests on execution times:
Benchmarck data:
a = np.array([[1,20,200]*1000, [1,30,100]*1000, [1,10,300]*1000])
Pandas version:
%%timeit
pd.DataFrame(a).sort_values(list(range(a.shape[1]))).values
# 3.66 s ± 110 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Numpy version:
%%timeit
a[np.lexsort((a[:,0], a[:,1],a[:,2]))]
# 39.6 µs ± 12.1 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
As you can see you go from the micro seconds of numpy to the seconds of the version based on pandas (about 1 million times slower).
The choice is yours :)
Upvotes: 1
Reputation: 150735
One option that uses pandas
's sort_values
:
pd.DataFrame(a).sort_values(list(range(a.shape[1]))).values
Output:
array([[ 1. , 0.1 , 20. , 0.3 , 0.0781932 ],
[ 1. , 1. , 5. , 0.4 , 0.08652187],
[ 1. , 1. , 10. , 0.2 , 0.07550707],
[ 1. , 1. , 15. , 0.3 , 0.0781932 ],
[ 1. , 1. , 15. , 0.4 , 0.08047399],
[ 1. , 1. , 15. , 0.5 , 0.08204023],
[ 1. , 1. , 20. , 0. , 0.07079655],
[ 1. , 1. , 20. , 0.1 , 0.07286705],
[ 2. , 0.1 , 0. , 0. , 0.07078664],
[ 2. , 0.1 , 0. , 0.1 , 0.07284944]])
Upvotes: 1
Reputation: 53029
You can use lexsort
:
original[np.lexsort(np.rot90(original))]
Upvotes: 1