Reputation: 601
I have a dataframe which looks like this
pd.DataFrame({'A': ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H'],
...: 'B': ['C1', 'C1', 'C1', 'C1', 'C2', 'C2', 'C2', 'C2'],
...: 'X': [1, 2, 1, 2, 2, 3, 4, 5],
...: 'Y': [2, 1, 2, 2, 7, 5, 7, 7],
...: 'Z': [2, 1, 2, 1, 5, 8, 1, 9]})
Out[10]:
A B X Y Z
0 A C1 1 2 2
1 B C1 2 1 1
2 C C1 1 2 2
3 D C1 2 2 1
4 E C2 2 7 5
5 F C2 3 5 8
6 G C2 4 7 1
7 H C2 5 7 9
I need to sort the dataframe by columns B, X, Y, Z and then rank within each group of B.
Resulting dataframe should look like this.
Out[12]:
A B X Y Z R
1 B C1 2 1 1 1
3 D C1 2 2 1 2
0 A C1 1 2 2 3
2 C C1 1 2 2 4
6 G C2 4 7 1 1
5 F C2 3 5 2 2
4 E C2 2 1 5 3
7 H C2 5 7 9 4
I know I can use df.sort_values(['B', 'Z', 'Y', 'X']) to bring in right order but struggling to apply the rank.
what is the 1 line of code for sorting and ranking?
Upvotes: 0
Views: 1142
Reputation: 150805
You can use groupby().cumcount()
:
df['R'] = df.sort_values(['B','X','Y','Z']).groupby('B').cumcount() + 1
Output:
A B X Y Z R
0 A C1 1 2 2 3
1 B C1 2 1 1 1
2 C C1 1 2 2 4
3 D C1 2 2 1 2
4 E C2 2 7 5 2
5 F C2 3 5 8 3
6 G C2 4 7 1 1
7 H C2 5 7 9 4
To match your output, separate sort_values
and groupby()
:
df = df.sort_values(['B','Z','Y','X'])
df['R'] = df.groupby('B').cumcount() + 1
Output:
A B X Y Z R
1 B C1 2 1 1 1
3 D C1 2 2 1 2
0 A C1 1 2 2 3
2 C C1 1 2 2 4
6 G C2 4 7 1 1
4 E C2 2 7 5 2
5 F C2 3 5 8 3
7 H C2 5 7 9 4
Upvotes: 2