Reputation: 51
I have searched around and tried to find a solution to what seems to be a simple problem, but have come up with nothing. The problem is to sort a matrix based on its columns, progressively. So, if I have a numpy matrix like:
import numpy as np
X=np.matrix([[0,0,1,2],[0,0,1,1],[0,0,0,4],[0,0,0,3],[0,1,2,5]])
print(X)
[[0 0 1 2]
[0 0 1 1]
[0 0 0 4]
[0 0 0 3]
[0 1 2 5]]
I would like to sort it based on the first column, then the second, the third, and so on, to get a result like:
Xsorted=np.matrix([[0,0,0,3],[0,0,0,4],[0,0,1,1],[0,0,1,2],[0,1,2,5]])
print(Xsorted)
[[0,0,0,3]
[0,0,0,4]
[0,0,1,1]
[0,0,1,2]
[0,1,2,5]]
While I think it is possible to sort a matrix like this by naming the columns and all that, I would prefer to have a method for sorting that doesn't depend so much on how big the matrix is. I am using Python 3.4, if that is important.
Any help would be greatly appreciated!
Upvotes: 3
Views: 422
Reputation: 17797
It's not going to be particularly fast, but you can always convert your rows to tuples, then use Python's sort:
np.matrix(sorted(map(tuple, X.A)))
You can also use np.lexsort
, as suggested in this answer to a somewhat related question:
X[np.lexsort(X.T[::-1])]
The lexsort approach appears to be faster, though you should test with your actual data to make sure:
In [20]: X = np.matrix(np.random.randint(10, size=(100,100)))
In [21]: %timeit np.matrix(sorted(map(tuple, X.A)))
100 loops, best of 3: 2.23 ms per loop
In [22]: %timeit X[np.lexsort(X.T[::-1])]
1000 loops, best of 3: 1.22 ms per loop
Upvotes: 2
Reputation: 1232
Here:
data = [[0,0,1,2],[0,0,1,1],[0,0,0,4],[0,0,0,3],[0,1,2,5]]
x = pandas.DataFrame(data)
# order of columns to sort
z = x.sort([0,1,2,3])
output = z.as_matrix()
output
:
array([[0, 0, 0, 3],
[0, 0, 0, 4],
[0, 0, 1, 1],
[0, 0, 1, 2],
[0, 1, 2, 5]])
Upvotes: 1