the_tc
the_tc

Reputation: 41

Pandas: insert a number for the unique value in each group

My DataFrame is:

df = pd.DataFrame({
'ID':['27637', '27637', '27637', '27637', '89283', '89283', '89283', '89283'],
'Country':['UK', 'Poland', 'Poland', 'Poland', 'China', 'China', 'India', 'India']})

I added an additional column by df['Dense_Rank'] = df.groupby('ID')['Country'].rank(method='dense')

However, I don't like group 27637 as UK is marked as 2.0 while Poland 1.0. As UK appears first, I expect it to be marked as 1.0 while Poland as 2.0. Grouping for 89283 looks good.

Here is my expected result:

ID Country Dense_Rank Expected_Result
27637 UK 2.0 1.0
27637 Poland 1.0 2.0
27637 Poland 1.0 2.0
27637 Poland 1.0 2.0
89283 China 1.0 1.0
89283 China 1.0 1.0
89283 India 2.0 2.0
89283 India 2.0 2.0

Upvotes: 0

Views: 30

Answers (1)

Timeless
Timeless

Reputation: 37747

One option, is to use pandas.factorize :

df['Expected_Result'] = df.groupby(['ID'])['Country'].transform(lambda x: pd.factorize(x)[0]+1)

Output :

print(df)

      ID Country  Dense_Rank  Expected_Result
0  27637      UK         2.0                1
1  27637  Poland         1.0                2
2  27637  Poland         1.0                2
3  27637  Poland         1.0                2
4  89283   China         1.0                1
5  89283   China         1.0                1
6  89283   India         2.0                2
7  89283   India         2.0                2

Upvotes: 1

Related Questions