AJG519
AJG519

Reputation: 3379

Get ordered index by dataframe column values

I have something like the following dataframe indexed by state and year:

import pandas as pd

d = pd.DataFrame(index=['CA', 'WA', 'OR', 'NV'])
d[2015]=[100, 200, 40, 75]
d[2016]=[1000, 20, 25, 12]
print d
    2015  2016
CA   100  1000
WA   200    20
OR    40    25
NV    75    12

I want a new dataframe where the index is the rank of each state for that year, and the values are the states in their ranked order.

I can get that output as follows:

pd.DataFrame([d.sort_values(x).index for x in d.columns], index=d.columns, columns=range(1,len(d)+1)).transpose()
Out[57]: 
  2015 2016
1   OR   NV
2   NV   WA
3   CA   OR
4   WA   CA

Is there a cleaner way of getting this output?

Upvotes: 0

Views: 209

Answers (2)

piRSquared
piRSquared

Reputation: 294488

with an obnoxious amount of formatting

d.apply(pd.Series.rank).stack().reset_index(0, name='x') \
    .set_index('x', append=True).squeeze() \
    .unstack(0).rename_axis(None).rename(index=int)

  2015 2016
1   OR   NV
2   NV   WA
3   CA   OR
4   WA   CA

Or better with numpy

a = d.values.argsort(0)

pd.DataFrame(d.index.values[a], range(1, len(d) + 1), d.columns)

  2015 2016
1   OR   NV
2   NV   WA
3   CA   OR
4   WA   CA

Upvotes: 2

akuiper
akuiper

Reputation: 215047

You can use argsort, which returns the index to sort each column, apply it to the index gives the ranked index in ascending order:

d.apply(lambda x: x.index[x.argsort()])

enter image description here

you can call reset_index(drop=True) if you don't want to keep the original index on the data frame.

Upvotes: 3

Related Questions