Rank 2D array rowwise

Question

This question asks about sorting 1D arrays, two solutions are given, first one suggests using argsort twice, second, more time-efficient, uses it only once. What if I wanted to rank 2D array rowwise like this?

Using argsort twice is one possibility:

def rank(x, axis=-1):
   return np.argsort(np.argsort(x, axis=axis), axis=axis)

For the data:

x = np.array([
    [1,  2,  30],
    [4,  5,  6],
    [90, 8,  7],
    [12, 15, 10]
])

it returns correct retults:

rank(x, axis=0)
## array([[0, 0, 3],
##        [1, 1, 0],
##        [3, 2, 1],
##        [2, 3, 2]])

rank(x, axis=1)
## array([[0, 1, 2],
##        [0, 1, 2],
##        [2, 1, 0],
##        [1, 2, 0]])

But is there more efficient approach?

Divakar · Accepted Answer

Using only once with advanced-indexing -

sidx = np.argsort(x, axis=1)

# Store shape info
m,n = x.shape

# Initialize output array
out = np.empty((m,n),dtype=int)

# Use sidx as column indices, while a range array for the row indices
# to select one element per row. Since sidx is a 2D array of indices
# we need to use a 2D extended range array for the row indices
out[np.arange(m)[:,None], sidx] = np.arange(n)

For getting the ranks along each column, simply change the indexing step to use sidx as the row indices, while the range array for the column indices automatically broadcasts. Also, the values to be assigned are to be extended to 2D for the values to be broadcasted before assignments :

sidx = np.argsort(x, axis=0)
out[sidx, np.arange(n)] = np.arange(m)[:,None]

Timings on 5k x 5k array

To see the improvements with the proposed method, since both are using the result of the first argsort, let's pre-compute it and then time rest of the steps -

In [248]: x = np.random.rand(5000,5000)

In [249]: axis = 1

In [250]: sidx = np.argsort(x, axis=1)

In [251]: %timeit np.argsort(sidx, axis=axis)
1 loop, best of 3: 1.31 s per loop

In [252]: %%timeit
     ...: m,n = x.shape
     ...: out = np.empty((m,n),dtype=int)
     ...: out[np.arange(m)[:,None], sidx] = np.arange(n)
10 loops, best of 3: 156 ms per loop

Seeing a speedup of 8x+ with the proposed one for such an array.

Rank 2D array rowwise

Answers (2)

Related Questions