Reputation: 41
My DataFrame is:
df = pd.DataFrame({
'ID':['27637', '27637', '27637', '27637', '89283', '89283', '89283', '89283'],
'Country':['UK', 'Poland', 'Poland', 'Poland', 'China', 'China', 'India', 'India']})
I added an additional column by
df['Dense_Rank'] = df.groupby('ID')['Country'].rank(method='dense')
However, I don't like group 27637 as UK is marked as 2.0 while Poland 1.0. As UK appears first, I expect it to be marked as 1.0 while Poland as 2.0. Grouping for 89283 looks good.
Here is my expected result:
ID | Country | Dense_Rank | Expected_Result |
---|---|---|---|
27637 | UK | 2.0 | 1.0 |
27637 | Poland | 1.0 | 2.0 |
27637 | Poland | 1.0 | 2.0 |
27637 | Poland | 1.0 | 2.0 |
89283 | China | 1.0 | 1.0 |
89283 | China | 1.0 | 1.0 |
89283 | India | 2.0 | 2.0 |
89283 | India | 2.0 | 2.0 |
Upvotes: 0
Views: 30
Reputation: 37747
One option, is to use pandas.factorize
:
df['Expected_Result'] = df.groupby(['ID'])['Country'].transform(lambda x: pd.factorize(x)[0]+1)
Output :
print(df)
ID Country Dense_Rank Expected_Result
0 27637 UK 2.0 1
1 27637 Poland 1.0 2
2 27637 Poland 1.0 2
3 27637 Poland 1.0 2
4 89283 China 1.0 1
5 89283 China 1.0 1
6 89283 India 2.0 2
7 89283 India 2.0 2
Upvotes: 1