Algorithmatic
Algorithmatic

Reputation: 1892

pandas create new column to reflect rank within group based on column

I have date as follows

group name score
1 p2382 7.55
1 p9183 3.22
1 p1253 5.77
2 p3382 2.11
2 p5583 1.22
2 p1211 0.77
3 p0482 8.55
3 p7374 8.21
3 p2382 7.77

How do I create a new column 'rank' that outputs the rank of each name in a group based on the score?

The output should be sorted based on rank

group name rank score
1 p2382 1 7.55
1 p1253 2 5.77
1 p9183 3 3.22
2 p3382 1 2.11
2 p5583 2 1.22
2 p1211 3 0.77
3 p0482 1 8.55
3 p7374 2 8.21
3 p2382 3 7.77

Upvotes: 0

Views: 504

Answers (2)

Henry Ecker
Henry Ecker

Reputation: 35636

Use groupby rank with ascending=False and method='first':

df['rank'] = (
    df.groupby('group')['score']
        .rank(ascending=False, method='first')
        .astype(int)
)

df (using the first table as input):

   group   name  score  rank
0      1  p2382   7.55     1
1      1  p9183   3.22     3
2      1  p1253   5.77     2
3      2  p3382   2.11     1
4      2  p5583   1.22     2
5      2  p1211   0.77     3
6      3  p0482   8.55     1
7      3  p7374   8.21     2
8      3  p2382   7.77     3

insert can be used to put the new column directly in the correct location:

df.insert(
    2, 'rank',
    df.groupby('group')['score']
        .rank(ascending=False, method='first')
        .astype(int)
)

df:

   group   name  rank  score
0      1  p2382     1   7.55
1      1  p9183     3   3.22
2      1  p1253     2   5.77
3      2  p3382     1   2.11
4      2  p5583     2   1.22
5      2  p1211     3   0.77
6      3  p0482     1   8.55
7      3  p7374     2   8.21
8      3  p2382     3   7.77

DataFrame and imports:

import pandas as pd

df = pd.DataFrame({
    'group': [1, 1, 1, 2, 2, 2, 3, 3, 3],
    'name': ['p2382', 'p9183', 'p1253', 'p3382', 'p5583', 'p1211', 'p0482',
             'p7374', 'p2382'],
    'score': [7.55, 3.22, 5.77, 2.11, 1.22, 0.77, 8.55, 8.21, 7.77]
})

Example with duplicate values:

df = pd.DataFrame({
    'group': [1, 1, 1, 1, ],
    'name': ['a', 'b', 'c', 'd'],
    'score': [7, 2, 1, 2]
})

df['rank'] = (
    df.groupby('group')['score']
        .rank(ascending=False, method='first')
        .astype(int)
)

df:

   group name  score  rank
0      1    a      7     1
1      1    b      2     2
2      1    c      1     4
3      1    d      2     3  # Increases even though duplicate

Upvotes: 1

BENY
BENY

Reputation: 323236

Let us try sort_values with cumcount

df['rank'] = df.sort_values('score',ascending=False).groupby('group')['score'].cumcount()+1
df
Out[172]: 
   group   name  score  rank
0      1  p2382   7.55     1
1      1  p9183   3.22     3
2      1  p1253   5.77     2
3      2  p3382   2.11     1
4      2  p5583   1.22     2
5      2  p1211   0.77     3
6      3  p0482   8.55     1
7      3  p7374   8.21     2
8      3  p2382   7.77     3

Upvotes: 1

Related Questions