Reputation: 7464
This question asks about sorting 1D arrays, two solutions are given, first one suggests using argsort
twice, second, more time-efficient, uses it only once. What if I wanted to rank 2D array rowwise like this?
Using argsort
twice is one possibility:
def rank(x, axis=-1):
return np.argsort(np.argsort(x, axis=axis), axis=axis)
For the data:
x = np.array([
[1, 2, 30],
[4, 5, 6],
[90, 8, 7],
[12, 15, 10]
])
it returns correct retults:
rank(x, axis=0)
## array([[0, 0, 3],
## [1, 1, 0],
## [3, 2, 1],
## [2, 3, 2]])
rank(x, axis=1)
## array([[0, 1, 2],
## [0, 1, 2],
## [2, 1, 0],
## [1, 2, 0]])
But is there more efficient approach?
Upvotes: 2
Views: 922
Reputation: 221524
Using only once
with advanced-indexing
-
sidx = np.argsort(x, axis=1)
# Store shape info
m,n = x.shape
# Initialize output array
out = np.empty((m,n),dtype=int)
# Use sidx as column indices, while a range array for the row indices
# to select one element per row. Since sidx is a 2D array of indices
# we need to use a 2D extended range array for the row indices
out[np.arange(m)[:,None], sidx] = np.arange(n)
For getting the ranks along each column, simply change the indexing step to use sidx
as the row indices, while the range array for the column indices automatically broadcasts. Also, the values to be assigned are to be extended to 2D for the values to be broadcasted before assignments :
sidx = np.argsort(x, axis=0)
out[sidx, np.arange(n)] = np.arange(m)[:,None]
Timings on 5k x 5k
array
To see the improvements with the proposed method, since both are using the result of the first argsort
, let's pre-compute it and then time rest of the steps -
In [248]: x = np.random.rand(5000,5000)
In [249]: axis = 1
In [250]: sidx = np.argsort(x, axis=1)
In [251]: %timeit np.argsort(sidx, axis=axis)
1 loop, best of 3: 1.31 s per loop
In [252]: %%timeit
...: m,n = x.shape
...: out = np.empty((m,n),dtype=int)
...: out[np.arange(m)[:,None], sidx] = np.arange(n)
10 loops, best of 3: 156 ms per loop
Seeing a speedup of 8x+
with the proposed one for such an array.
Upvotes: 4
Reputation: 53029
Here is a 2D version of the better of the linked solutions.
def rank(x, axis=-1):
m, n = x.shape
res = np.empty((m, n), dtype=int)
I = np.ogrid[:m, :n]
rng, I[axis] = I[axis], x.argsort(axis=axis)
res[I] = rng
return res
And an ND version:
def rank(x, axis=-1):
res = np.empty(x.shape, dtype=int)
I = np.ogrid[tuple(map(slice, x.shape))]
rng, I[axis] = I[axis], x.argsort(axis=axis)
res[I] = rng
return res
Upvotes: 3