John Shin
John Shin

Reputation: 149

pandas sort a column by values in another column

I have a dataset that I want to sort and assign rank based on it.

Suppose it has two columns, one is year and the other is the column that I want to sort.

import pandas as pd
data = {'year': pd.Series([2006, 2006, 2007, 2007]), 
        'value': pd.Series([5, 10, 4, 1])}
df = pd.DataFrame(data)

I want to sort the column 'value' by each year and then give rank to it. What I would like to have is

data2= {'year': pd.Series([2006, 2006, 2007, 2007]), 
        'value': pd.Series([10, 5, 4, 1]),  
        'rank': pd.Series([1, 2, 1, 2]}
df2=pd.DataFrame(data2)

>>> df2
   rank  value  year
0     1     10  2006
1     2      5  2006
2     1      4  2007
3     2      1  2007

Upvotes: 7

Views: 22509

Answers (2)

Alexander
Alexander

Reputation: 109546

You can use groupby and then use rank (with ascending=False to get the largest values first). You don't need to sort in the groupby, as the result is indexed to the dataframe (slightly faster performance).

df['yearly_rank'] = df.groupby('year', sort=False)['value'].rank(ascending=False)

>>> df.sort_values(['year', 'yearly_rank'])
   value  year  yearly_rank
1     10  2006            1
0      5  2006            2
2      4  2007            1
3      1  2007            2

Upvotes: 12

Parfait
Parfait

Reputation: 107587

Consider a groupby apply function with sort:

def rankfct(row):    
    row['rank'] = row['value'].rank(ascending=False)    
    return row

df = df.groupby(['year']).apply(rankfct).sort(['year','value'], ascending=[1,0])

Upvotes: 0

Related Questions