Reputation: 4828

Rank within columns of 2d array

>>> a = array([[10, 50, 20, 30, 40],
...            [50, 30, 40, 20, 10],
...            [30, 20, 20, 10, 50]])

>>> some_np_expression(a)
array([[1, 3, 1, 3, 2],
       [3, 2, 3, 2, 1],
       [2, 1, 2, 1, 3]])

What is some_np_expression? Don't care about how ties are settled so long as the ranks are distinct and sequential.

Upvotes: 6

Answers (3)

Sachin Rastogi

Reputation: 477

from scipy.stats.mstats import rankdata
import numpy as np

a = np.array([[10, 50, 20, 30, 40],
              [50, 30, 40, 20, 10],
              [30, 20, 20, 10, 50]])

rank = (rankdata(a, axis=0)-1).astype(int)

The output will be as follows.

array([[0, 2, 0, 2, 1],
       [2, 1, 2, 1, 0],
       [1, 0, 0, 0, 2]])

Upvotes: 0

Primoz

Reputation: 1492

Now Scipy offers a function to rank data with an axis argument - you can set along what axis you want to rank the data.

from scipy.stats.mstats import rankdata    
a = array([[10, 50, 20, 30, 40],
           [50, 30, 40, 20, 10],
           [30, 20, 20, 10, 50]])

ranked_vertical = rankdata(a, axis=0)

Upvotes: 3

Warren Weckesser

Reputation: 114811

Double argsort is a standard (but inefficient!) way to do this:

In [120]: a
Out[120]: 
array([[10, 50, 20, 30, 40],
       [50, 30, 40, 20, 10],
       [30, 20, 20, 10, 50]])

In [121]: a.argsort(axis=0).argsort(axis=0) + 1
Out[121]: 
array([[1, 3, 1, 3, 2],
       [3, 2, 3, 2, 1],
       [2, 1, 2, 1, 3]])

With some more code, you can avoid sorting twice. Note that I'm using a different a in the following:

In [262]: a
Out[262]: 
array([[30, 30, 10, 10],
       [10, 20, 20, 30],
       [20, 10, 30, 20]])

Call argsort once:

In [263]: s = a.argsort(axis=0)

Use s to construct the array of rankings:

In [264]: i = np.arange(a.shape[0]).reshape(-1, 1)

In [265]: j = np.arange(a.shape[1])

In [266]: ranked = np.empty_like(a, dtype=int)

In [267]: ranked[s, j] = i + 1

In [268]: ranked
Out[268]: 
array([[3, 3, 1, 1],
       [1, 2, 2, 3],
       [2, 1, 3, 2]])

Here's the less efficient (but more concise) version:

In [269]: a.argsort(axis=0).argsort(axis=0) + 1
Out[269]: 
array([[3, 3, 1, 1],
       [1, 2, 2, 3],
       [2, 1, 3, 2]])

Upvotes: 7

Rank within columns of 2d array

Answers (3)

Related Questions