Reputation: 123
I am trying to rank a large dataset using python. I do not want duplicates and rather than using the 'first' method, I would instead like it to look at another column and rank it based on that value.
It should only look at the second column if the rank in the first column has duplicates.
Name CountA CountB
Alpha 15 3
Beta 20 52
Delta 20 31
Gamma 45 43
I would like the ranking to end up
Name CountA CountB Rank
Alpha 15 3 4
Beta 20 52 2
Delta 20 31 3
Gamma 45 43 1
Currently, I am using df.rank(ascending=False, method='first')
Upvotes: 1
Views: 1461
Reputation: 2494
You can take the counts of the values in CountA and then filter the DataFrame rows based on the count of CountA being greater than 1. Where the count is greater than 1, take CountB, otherwise CountA.
df = pd.DataFrame([[15,3],[20,52],[20,31],[45,43]],columns=['CountA','CountB'])
colAcount = df['CountA'].value_counts()
#then take the indices where colACount > 1 and use them in a where
df['final'] = df['CountA'].where(~df['CountA'].isin(colAcount[colAcount>1].index),df['CountB'])
df = df.sort_values(by='final', ascending=False).reset_index(drop=True)
# the rank is the index
CountA CountB final
0 20 52 52
1 45 43 45
2 20 31 31
3 15 3 15
See this for more details.
Upvotes: 0
Reputation: 46938
Maybe use sort and pull out the index:
import pandas as pd
df = pd.DataFrame({'Name':['A','B','C','D'],'CountA':[15,20,20,45],'CountB':[3,52,31,43]})
df['rank'] = df.sort_values(['CountA','CountB'],ascending=False).index + 1
Name CountA CountB rank
0 A 15 3 4
1 B 20 52 2
2 C 20 31 3
3 D 45 43 1
Upvotes: 1