Reputation: 1892

pandas create new column to reflect rank within group based on column

I have date as follows

group	name	score
1	p2382	7.55
1	p9183	3.22
1	p1253	5.77
2	p3382	2.11
2	p5583	1.22
2	p1211	0.77
3	p0482	8.55
3	p7374	8.21
3	p2382	7.77

How do I create a new column 'rank' that outputs the rank of each name in a group based on the score?

The output should be sorted based on rank

group	name	rank	score
1	p2382	1	7.55
1	p1253	2	5.77
1	p9183	3	3.22
2	p3382	1	2.11
2	p5583	2	1.22
2	p1211	3	0.77
3	p0482	1	8.55
3	p7374	2	8.21
3	p2382	3	7.77

Upvotes: 0

Answers (2)

Henry Ecker

Reputation: 35686

Use groupby rank with ascending=False and method='first':

df['rank'] = (
    df.groupby('group')['score']
        .rank(ascending=False, method='first')
        .astype(int)
)

df (using the first table as input):

   group   name  score  rank
0      1  p2382   7.55     1
1      1  p9183   3.22     3
2      1  p1253   5.77     2
3      2  p3382   2.11     1
4      2  p5583   1.22     2
5      2  p1211   0.77     3
6      3  p0482   8.55     1
7      3  p7374   8.21     2
8      3  p2382   7.77     3

insert can be used to put the new column directly in the correct location:

df.insert(
    2, 'rank',
    df.groupby('group')['score']
        .rank(ascending=False, method='first')
        .astype(int)
)

df:

   group   name  rank  score
0      1  p2382     1   7.55
1      1  p9183     3   3.22
2      1  p1253     2   5.77
3      2  p3382     1   2.11
4      2  p5583     2   1.22
5      2  p1211     3   0.77
6      3  p0482     1   8.55
7      3  p7374     2   8.21
8      3  p2382     3   7.77

DataFrame and imports:

import pandas as pd

df = pd.DataFrame({
    'group': [1, 1, 1, 2, 2, 2, 3, 3, 3],
    'name': ['p2382', 'p9183', 'p1253', 'p3382', 'p5583', 'p1211', 'p0482',
             'p7374', 'p2382'],
    'score': [7.55, 3.22, 5.77, 2.11, 1.22, 0.77, 8.55, 8.21, 7.77]
})

Example with duplicate values:

df = pd.DataFrame({
    'group': [1, 1, 1, 1, ],
    'name': ['a', 'b', 'c', 'd'],
    'score': [7, 2, 1, 2]
})

df['rank'] = (
    df.groupby('group')['score']
        .rank(ascending=False, method='first')
        .astype(int)
)

df:

   group name  score  rank
0      1    a      7     1
1      1    b      2     2
2      1    c      1     4
3      1    d      2     3  # Increases even though duplicate

Upvotes: 1

BENY

Reputation: 323396

Let us try sort_values with cumcount

df['rank'] = df.sort_values('score',ascending=False).groupby('group')['score'].cumcount()+1
df
Out[172]: 
   group   name  score  rank
0      1  p2382   7.55     1
1      1  p9183   3.22     3
2      1  p1253   5.77     2
3      2  p3382   2.11     1
4      2  p5583   1.22     2
5      2  p1211   0.77     3
6      3  p0482   8.55     1
7      3  p7374   8.21     2
8      3  p2382   7.77     3

Upvotes: 1

pandas create new column to reflect rank within group based on column

Answers (2)

Related Questions