tj judge
tj judge

Reputation: 616

Break ties using rank function (OR other function) PYTHON

I have the following dataframe:

ID Name    Weight Score  
1  Amazon    2    11     
1  Apple     4    10     
1  Netflix   1    10     
2  Amazon    2    8      
2  Apple     4    8      
2  Netflix   1    5     

Currently I have a code which looks like this

#add weight and score column
df['Rank'] = df['Weight'] + df['Score']
#create score rank on ID column
df['Score_Rank'] = df.groupby('ID')['Rank'].rank("first", ascending = False) 

This code does not give me exactly what I want.

I would like to first rank on Score, without including the weight. And then break any ties in the rank by adding weight column to break them. If there are further ties after weight column has been added, then rank would be by random selection.

I think an if statement could work in this scenario, just not sure how.

Expected output:

ID Name    Weight Score  Score_Rank
1  Amazon    2    11     1
1  Apple     4    10     2
1  Netflix   1    10     3
2  Amazon    2    8      2
2  Apple     4    8      1
2  Netflix   1    5      3   

Upvotes: 3

Views: 2756

Answers (2)

Scott Boston
Scott Boston

Reputation: 153460

You can use rank with method='first' with some presorting first:

df['Score_Rank'] = (df.sort_values('Weight', ascending=False)
                      .groupby(['ID'])['Score']
                      .rank(method='first', ascending=False)
                   )

Output:

   ID     Name  Weight  Score  Score_Rank
0   1   Amazon       2     11         1.0
1   1    Apple       4     10         2.0
2   1  Netflix       1     10         3.0
3   2   Amazon       2      8         2.0
4   2    Apple       4      8         1.0
5   2  Netflix       1      5         3.0

Details:

First, sort your dataframe by weights descending, then use rank with method first on Score which will break ties based on sort order of the dataframe. And because, pandas does intrinsic data alignment, assign to new column 'Score_Rank' yeilds the based on original order of the dataframe.

Upvotes: 6

Quang Hoang
Quang Hoang

Reputation: 150745

Try with cumcount:

df['Score_Rank'] = (df.sort_values(['Score','Weight'])
                      .groupby(['ID']).cumcount(ascending=False)+1
                   )

Output:

   ID     Name  Weight  Score  Score_Rank
0   1   Amazon       2     11           1
1   1    Apple       4     10           2
2   1  Netflix       1     10           3
3   2   Amazon       2      8           2
4   2    Apple       4      8           1
5   2  Netflix       1      5           3

Upvotes: 4

Related Questions