Jessica
Jessica

Reputation: 3173

Pandas rank method dense but skip a number

I have a sample data set that i'm trying to rank based on the values in the column 'HP':

import pandas as pd

d = {
 'unit': ['UD', 'UD', 'UD' ,'UC','UC', 'UC','UA','UA','UA','UB','UB','UB'],
 'N-D': [ 'C1', 'C2', 'C3','Q1', 'Q2', 'Q3','D1','D2','D3','E1','E2','E3'],
 'HP': [24, 24, 24,7,7,7,7,7,7,5,5,5]
}
df = pd.DataFrame(d)

df['rank']=df['HP'].rank(ascending=False, method='dense')

df

it looks like:

    HP N-D unit  rank
0   24  C1   UD   1.0
1   24  C2   UD   1.0
2   24  C3   UD   1.0
3    7  Q1   UC   2.0
4    7  Q2   UC   2.0
5    7  Q3   UC   2.0
6    7  D1   UA   2.0
7    7  D2   UA   2.0
8    7  D3   UA   2.0
9    5  E1   UB   3.0
10   5  E2   UB   3.0
11   5  E3   UB   3.0

the 'HP' is a calculated column based on other columns (i won't show it here, but it's necessary in my real dataset)

I also tried the method='min' but the outcome looks like this:

    HP N-D unit  rank
0   24  C1   UD   1.0
1   24  C2   UD   1.0
2   24  C3   UD   1.0
3    7  Q1   UC   4.0
4    7  Q2   UC   4.0
5    7  Q3   UC   4.0
6    7  D1   UA   4.0
7    7  D2   UA   4.0
8    7  D3   UA   4.0
9    5  E1   UB  10.0
10   5  E2   UB  10.0
11   5  E3   UB  10.0

Units 'UC' and 'UA' tie for 2nd rank, what i'm looking for is to have the next rank which is unit 'UB' to be '4' instead of '3'. :

    HP N-D unit  rank
0   24  C1   UD   1.0
1   24  C2   UD   1.0
2   24  C3   UD   1.0
3    7  Q1   UC   2.0
4    7  Q2   UC   2.0
5    7  Q3   UC   2.0
6    7  D1   UA   2.0
7    7  D2   UA   2.0
8    7  D3   UA   2.0
9    5  E1   UB   4.0
10   5  E2   UB   4.0
11   5  E3   UB   4.0

Upvotes: 2

Views: 2031

Answers (4)

piRSquared
piRSquared

Reputation: 294358

Use a combination of groupby and sort_values

g = df.sort_values(
    ['HP', 'unit'], ascending=False
).groupby(['HP', 'unit'], sort=False)

df.assign(rank=g.ngroup().add(1).groupby(df.HP).transform('first'))

    HP N-D unit  rank
0   24  C1   UD     1
1   24  C2   UD     1
2   24  C3   UD     1
3    7  Q1   UC     2
4    7  Q2   UC     2
5    7  Q3   UC     2
6    7  D1   UA     2
7    7  D2   UA     2
8    7  D3   UA     2
9    5  E1   UB     4
10   5  E2   UB     4
11   5  E3   UB     4

Another way using nunique and map

df.assign(
    rank=df.HP.map(
        df.sort_values(
            ['HP', 'unit'], ascending=False
        ).groupby(
            'HP', sort=False
        ).unit.nunique().shift().fillna(1).cumsum())
)

    HP N-D unit  rank
0   24  C1   UD   1.0
1   24  C2   UD   1.0
2   24  C3   UD   1.0
3    7  Q1   UC   2.0
4    7  Q2   UC   2.0
5    7  Q3   UC   2.0
6    7  D1   UA   2.0
7    7  D2   UA   2.0
8    7  D3   UA   2.0
9    5  E1   UB   4.0
10   5  E2   UB   4.0
11   5  E3   UB   4.0

Upvotes: 4

BENY
BENY

Reputation: 323306

Let us try this factorize

#df['rank']=pd.factorize(df.unit)[0]+1
df['rank']=pd.factorize(df.unit+df.HP.astype(str))[0]+1

df['rank']=df.groupby('HP').rank.transform('min')
df
Out[335]: 
    HP N-D unit  rank
0   24  C1   UD     1
1   24  C2   UD     1
2   24  C3   UD     1
3    7  Q1   UC     2
4    7  Q2   UC     2
5    7  Q3   UC     2
6    7  D1   UA     2
7    7  D2   UA     2
8    7  D3   UA     2
9    5  E1   UB     4
10   5  E2   UB     4
11   5  E3   UB     4

Upvotes: 2

user2285236
user2285236

Reputation:

One option would be to calculate the ranks by dropping the duplicates first:

temp = df[['HP', 'unit']].drop_duplicates()
temp.assign(rank=temp['HP'].rank(method='min', ascending=False)).merge(df)
Out[48]: 
    HP unit  rank N-D
0   24   UD   1.0  C1
1   24   UD   1.0  C2
2   24   UD   1.0  C3
3    7   UC   2.0  Q1
4    7   UC   2.0  Q2
5    7   UC   2.0  Q3
6    7   UA   2.0  D1
7    7   UA   2.0  D2
8    7   UA   2.0  D3
9    5   UB   4.0  E1
10   5   UB   4.0  E2
11   5   UB   4.0  E3 

Added (antonvbr) a slight modification to assign directly:

df['rank']= (df[['HP', 'unit']].drop_duplicates()['HP']
            .rank(method='min', ascending=False)
            .reindex(range(len(df)),method='pad'))

Upvotes: 3

Gabriel A
Gabriel A

Reputation: 1827

rankings = df[["HP","unit"]].drop_duplicates()
rankings['rank'] = rankings['HP'].rank(ascending=False, method='min')
df = df.merge(rankings)
df

Upvotes: 0

Related Questions