Reputation: 1892
I have date as follows
group | name | score |
---|---|---|
1 | p2382 | 7.55 |
1 | p9183 | 3.22 |
1 | p1253 | 5.77 |
2 | p3382 | 2.11 |
2 | p5583 | 1.22 |
2 | p1211 | 0.77 |
3 | p0482 | 8.55 |
3 | p7374 | 8.21 |
3 | p2382 | 7.77 |
How do I create a new column 'rank' that outputs the rank of each name in a group based on the score?
The output should be sorted based on rank
group | name | rank | score |
---|---|---|---|
1 | p2382 | 1 | 7.55 |
1 | p1253 | 2 | 5.77 |
1 | p9183 | 3 | 3.22 |
2 | p3382 | 1 | 2.11 |
2 | p5583 | 2 | 1.22 |
2 | p1211 | 3 | 0.77 |
3 | p0482 | 1 | 8.55 |
3 | p7374 | 2 | 8.21 |
3 | p2382 | 3 | 7.77 |
Upvotes: 0
Views: 504
Reputation: 35636
Use groupby rank
with ascending=False
and method='first'
:
df['rank'] = (
df.groupby('group')['score']
.rank(ascending=False, method='first')
.astype(int)
)
df
(using the first table as input):
group name score rank
0 1 p2382 7.55 1
1 1 p9183 3.22 3
2 1 p1253 5.77 2
3 2 p3382 2.11 1
4 2 p5583 1.22 2
5 2 p1211 0.77 3
6 3 p0482 8.55 1
7 3 p7374 8.21 2
8 3 p2382 7.77 3
insert
can be used to put the new column directly in the correct location:
df.insert(
2, 'rank',
df.groupby('group')['score']
.rank(ascending=False, method='first')
.astype(int)
)
df
:
group name rank score
0 1 p2382 1 7.55
1 1 p9183 3 3.22
2 1 p1253 2 5.77
3 2 p3382 1 2.11
4 2 p5583 2 1.22
5 2 p1211 3 0.77
6 3 p0482 1 8.55
7 3 p7374 2 8.21
8 3 p2382 3 7.77
DataFrame and imports:
import pandas as pd
df = pd.DataFrame({
'group': [1, 1, 1, 2, 2, 2, 3, 3, 3],
'name': ['p2382', 'p9183', 'p1253', 'p3382', 'p5583', 'p1211', 'p0482',
'p7374', 'p2382'],
'score': [7.55, 3.22, 5.77, 2.11, 1.22, 0.77, 8.55, 8.21, 7.77]
})
Example with duplicate values:
df = pd.DataFrame({
'group': [1, 1, 1, 1, ],
'name': ['a', 'b', 'c', 'd'],
'score': [7, 2, 1, 2]
})
df['rank'] = (
df.groupby('group')['score']
.rank(ascending=False, method='first')
.astype(int)
)
df
:
group name score rank
0 1 a 7 1
1 1 b 2 2
2 1 c 1 4
3 1 d 2 3 # Increases even though duplicate
Upvotes: 1
Reputation: 323236
Let us try sort_values
with cumcount
df['rank'] = df.sort_values('score',ascending=False).groupby('group')['score'].cumcount()+1
df
Out[172]:
group name score rank
0 1 p2382 7.55 1
1 1 p9183 3.22 3
2 1 p1253 5.77 2
3 2 p3382 2.11 1
4 2 p5583 1.22 2
5 2 p1211 0.77 3
6 3 p0482 8.55 1
7 3 p7374 8.21 2
8 3 p2382 7.77 3
Upvotes: 1