Andre
Andre

Reputation: 123

Ranking with no duplicates

I am trying to rank a large dataset using python. I do not want duplicates and rather than using the 'first' method, I would instead like it to look at another column and rank it based on that value.

It should only look at the second column if the rank in the first column has duplicates.

Name   CountA CountB
Alpha  15     3
Beta   20     52
Delta  20     31
Gamma  45     43

I would like the ranking to end up

Name   CountA CountB Rank
Alpha  15     3      4
Beta   20     52     2
Delta  20     31     3
Gamma  45     43     1

Currently, I am using df.rank(ascending=False, method='first')

Upvotes: 1

Views: 1461

Answers (2)

forgetso
forgetso

Reputation: 2494

You can take the counts of the values in CountA and then filter the DataFrame rows based on the count of CountA being greater than 1. Where the count is greater than 1, take CountB, otherwise CountA.

df = pd.DataFrame([[15,3],[20,52],[20,31],[45,43]],columns=['CountA','CountB'])

colAcount = df['CountA'].value_counts()

#then take the indices where colACount > 1 and use them in a where

df['final'] = df['CountA'].where(~df['CountA'].isin(colAcount[colAcount>1].index),df['CountB'])

df = df.sort_values(by='final', ascending=False).reset_index(drop=True)

# the rank is the index
   CountA  CountB  final
0      20      52     52
1      45      43     45
2      20      31     31
3      15       3     15

See this for more details.

Upvotes: 0

StupidWolf
StupidWolf

Reputation: 46938

Maybe use sort and pull out the index:

import pandas as pd
df = pd.DataFrame({'Name':['A','B','C','D'],'CountA':[15,20,20,45],'CountB':[3,52,31,43]}) 
df['rank'] = df.sort_values(['CountA','CountB'],ascending=False).index + 1 

  Name  CountA  CountB  rank
0    A      15       3     4
1    B      20      52     2
2    C      20      31     3
3    D      45      43     1

Upvotes: 1

Related Questions