pandas sort a column by values in another column

Question

I have a dataset that I want to sort and assign rank based on it.

Suppose it has two columns, one is year and the other is the column that I want to sort.

import pandas as pd
data = {'year': pd.Series([2006, 2006, 2007, 2007]), 
        'value': pd.Series([5, 10, 4, 1])}
df = pd.DataFrame(data)

I want to sort the column 'value' by each year and then give rank to it. What I would like to have is

data2= {'year': pd.Series([2006, 2006, 2007, 2007]), 
        'value': pd.Series([10, 5, 4, 1]),  
        'rank': pd.Series([1, 2, 1, 2]}
df2=pd.DataFrame(data2)

>>> df2
   rank  value  year
0     1     10  2006
1     2      5  2006
2     1      4  2007
3     2      1  2007

Alexander · Accepted Answer

You can use groupby and then use rank (with ascending=False to get the largest values first). You don't need to sort in the groupby, as the result is indexed to the dataframe (slightly faster performance).

df['yearly_rank'] = df.groupby('year', sort=False)['value'].rank(ascending=False)

>>> df.sort_values(['year', 'yearly_rank'])
   value  year  yearly_rank
1     10  2006            1
0      5  2006            2
2      4  2007            1
3      1  2007            2

pandas sort a column by values in another column

Answers (2)

Related Questions