Reputation: 3379
I have something like the following dataframe indexed by state and year:
import pandas as pd
d = pd.DataFrame(index=['CA', 'WA', 'OR', 'NV'])
d[2015]=[100, 200, 40, 75]
d[2016]=[1000, 20, 25, 12]
print d
2015 2016
CA 100 1000
WA 200 20
OR 40 25
NV 75 12
I want a new dataframe where the index is the rank of each state for that year, and the values are the states in their ranked order.
I can get that output as follows:
pd.DataFrame([d.sort_values(x).index for x in d.columns], index=d.columns, columns=range(1,len(d)+1)).transpose()
Out[57]:
2015 2016
1 OR NV
2 NV WA
3 CA OR
4 WA CA
Is there a cleaner way of getting this output?
Upvotes: 0
Views: 209
Reputation: 294488
with an obnoxious amount of formatting
d.apply(pd.Series.rank).stack().reset_index(0, name='x') \
.set_index('x', append=True).squeeze() \
.unstack(0).rename_axis(None).rename(index=int)
2015 2016
1 OR NV
2 NV WA
3 CA OR
4 WA CA
Or better with numpy
a = d.values.argsort(0)
pd.DataFrame(d.index.values[a], range(1, len(d) + 1), d.columns)
2015 2016
1 OR NV
2 NV WA
3 CA OR
4 WA CA
Upvotes: 2
Reputation: 215047
You can use argsort
, which returns the index to sort each column, apply it to the index gives the ranked index in ascending order:
d.apply(lambda x: x.index[x.argsort()])
you can call reset_index(drop=True)
if you don't want to keep the original index on the data frame.
Upvotes: 3