Reputation: 3173
I have a sample data set that i'm trying to rank based on the values in the column 'HP':
import pandas as pd
d = {
'unit': ['UD', 'UD', 'UD' ,'UC','UC', 'UC','UA','UA','UA','UB','UB','UB'],
'N-D': [ 'C1', 'C2', 'C3','Q1', 'Q2', 'Q3','D1','D2','D3','E1','E2','E3'],
'HP': [24, 24, 24,7,7,7,7,7,7,5,5,5]
}
df = pd.DataFrame(d)
df['rank']=df['HP'].rank(ascending=False, method='dense')
df
it looks like:
HP N-D unit rank
0 24 C1 UD 1.0
1 24 C2 UD 1.0
2 24 C3 UD 1.0
3 7 Q1 UC 2.0
4 7 Q2 UC 2.0
5 7 Q3 UC 2.0
6 7 D1 UA 2.0
7 7 D2 UA 2.0
8 7 D3 UA 2.0
9 5 E1 UB 3.0
10 5 E2 UB 3.0
11 5 E3 UB 3.0
the 'HP' is a calculated column based on other columns (i won't show it here, but it's necessary in my real dataset)
I also tried the method='min' but the outcome looks like this:
HP N-D unit rank
0 24 C1 UD 1.0
1 24 C2 UD 1.0
2 24 C3 UD 1.0
3 7 Q1 UC 4.0
4 7 Q2 UC 4.0
5 7 Q3 UC 4.0
6 7 D1 UA 4.0
7 7 D2 UA 4.0
8 7 D3 UA 4.0
9 5 E1 UB 10.0
10 5 E2 UB 10.0
11 5 E3 UB 10.0
Units 'UC' and 'UA' tie for 2nd rank, what i'm looking for is to have the next rank which is unit 'UB' to be '4' instead of '3'. :
HP N-D unit rank
0 24 C1 UD 1.0
1 24 C2 UD 1.0
2 24 C3 UD 1.0
3 7 Q1 UC 2.0
4 7 Q2 UC 2.0
5 7 Q3 UC 2.0
6 7 D1 UA 2.0
7 7 D2 UA 2.0
8 7 D3 UA 2.0
9 5 E1 UB 4.0
10 5 E2 UB 4.0
11 5 E3 UB 4.0
Upvotes: 2
Views: 2031
Reputation: 294358
Use a combination of groupby
and sort_values
g = df.sort_values(
['HP', 'unit'], ascending=False
).groupby(['HP', 'unit'], sort=False)
df.assign(rank=g.ngroup().add(1).groupby(df.HP).transform('first'))
HP N-D unit rank
0 24 C1 UD 1
1 24 C2 UD 1
2 24 C3 UD 1
3 7 Q1 UC 2
4 7 Q2 UC 2
5 7 Q3 UC 2
6 7 D1 UA 2
7 7 D2 UA 2
8 7 D3 UA 2
9 5 E1 UB 4
10 5 E2 UB 4
11 5 E3 UB 4
Another way using nunique
and map
df.assign(
rank=df.HP.map(
df.sort_values(
['HP', 'unit'], ascending=False
).groupby(
'HP', sort=False
).unit.nunique().shift().fillna(1).cumsum())
)
HP N-D unit rank
0 24 C1 UD 1.0
1 24 C2 UD 1.0
2 24 C3 UD 1.0
3 7 Q1 UC 2.0
4 7 Q2 UC 2.0
5 7 Q3 UC 2.0
6 7 D1 UA 2.0
7 7 D2 UA 2.0
8 7 D3 UA 2.0
9 5 E1 UB 4.0
10 5 E2 UB 4.0
11 5 E3 UB 4.0
Upvotes: 4
Reputation: 323306
Let us try this factorize
#df['rank']=pd.factorize(df.unit)[0]+1
df['rank']=pd.factorize(df.unit+df.HP.astype(str))[0]+1
df['rank']=df.groupby('HP').rank.transform('min')
df
Out[335]:
HP N-D unit rank
0 24 C1 UD 1
1 24 C2 UD 1
2 24 C3 UD 1
3 7 Q1 UC 2
4 7 Q2 UC 2
5 7 Q3 UC 2
6 7 D1 UA 2
7 7 D2 UA 2
8 7 D3 UA 2
9 5 E1 UB 4
10 5 E2 UB 4
11 5 E3 UB 4
Upvotes: 2
Reputation:
One option would be to calculate the ranks by dropping the duplicates first:
temp = df[['HP', 'unit']].drop_duplicates()
temp.assign(rank=temp['HP'].rank(method='min', ascending=False)).merge(df)
Out[48]:
HP unit rank N-D
0 24 UD 1.0 C1
1 24 UD 1.0 C2
2 24 UD 1.0 C3
3 7 UC 2.0 Q1
4 7 UC 2.0 Q2
5 7 UC 2.0 Q3
6 7 UA 2.0 D1
7 7 UA 2.0 D2
8 7 UA 2.0 D3
9 5 UB 4.0 E1
10 5 UB 4.0 E2
11 5 UB 4.0 E3
Added (antonvbr) a slight modification to assign directly:
df['rank']= (df[['HP', 'unit']].drop_duplicates()['HP']
.rank(method='min', ascending=False)
.reindex(range(len(df)),method='pad'))
Upvotes: 3
Reputation: 1827
rankings = df[["HP","unit"]].drop_duplicates()
rankings['rank'] = rankings['HP'].rank(ascending=False, method='min')
df = df.merge(rankings)
df
Upvotes: 0