Tim
Tim

Reputation: 7464

Rank 2D array rowwise

This question asks about sorting 1D arrays, two solutions are given, first one suggests using argsort twice, second, more time-efficient, uses it only once. What if I wanted to rank 2D array rowwise like this?

Using argsort twice is one possibility:

def rank(x, axis=-1):
   return np.argsort(np.argsort(x, axis=axis), axis=axis)

For the data:

x = np.array([
    [1,  2,  30],
    [4,  5,  6],
    [90, 8,  7],
    [12, 15, 10]
])

it returns correct retults:

rank(x, axis=0)
## array([[0, 0, 3],
##        [1, 1, 0],
##        [3, 2, 1],
##        [2, 3, 2]])

rank(x, axis=1)
## array([[0, 1, 2],
##        [0, 1, 2],
##        [2, 1, 0],
##        [1, 2, 0]])

But is there more efficient approach?

Upvotes: 2

Views: 922

Answers (2)

Divakar
Divakar

Reputation: 221524

Using only once with advanced-indexing -

sidx = np.argsort(x, axis=1)

# Store shape info
m,n = x.shape

# Initialize output array
out = np.empty((m,n),dtype=int)

# Use sidx as column indices, while a range array for the row indices
# to select one element per row. Since sidx is a 2D array of indices
# we need to use a 2D extended range array for the row indices
out[np.arange(m)[:,None], sidx] = np.arange(n)

For getting the ranks along each column, simply change the indexing step to use sidx as the row indices, while the range array for the column indices automatically broadcasts. Also, the values to be assigned are to be extended to 2D for the values to be broadcasted before assignments :

sidx = np.argsort(x, axis=0)
out[sidx, np.arange(n)] = np.arange(m)[:,None]

Timings on 5k x 5k array

To see the improvements with the proposed method, since both are using the result of the first argsort, let's pre-compute it and then time rest of the steps -

In [248]: x = np.random.rand(5000,5000)

In [249]: axis = 1

In [250]: sidx = np.argsort(x, axis=1)

In [251]: %timeit np.argsort(sidx, axis=axis)
1 loop, best of 3: 1.31 s per loop

In [252]: %%timeit
     ...: m,n = x.shape
     ...: out = np.empty((m,n),dtype=int)
     ...: out[np.arange(m)[:,None], sidx] = np.arange(n)
10 loops, best of 3: 156 ms per loop

Seeing a speedup of 8x+ with the proposed one for such an array.

Upvotes: 4

Paul Panzer
Paul Panzer

Reputation: 53029

Here is a 2D version of the better of the linked solutions.

def rank(x, axis=-1):
    m, n = x.shape
    res = np.empty((m, n), dtype=int)
    I = np.ogrid[:m, :n]
    rng, I[axis] = I[axis], x.argsort(axis=axis)
    res[I] = rng
    return res

And an ND version:

def rank(x, axis=-1):
    res = np.empty(x.shape, dtype=int)
    I = np.ogrid[tuple(map(slice, x.shape))]
    rng, I[axis] = I[axis], x.argsort(axis=axis)
    res[I] = rng
    return res

Upvotes: 3

Related Questions