Sam
Sam

Reputation: 1246

How do I replace duplicate value in a column to make it unique based on another column in Pandas?

I am trying to make a column unique if they are duplicate and if another column values are duplicate. In this dataset.

    Name    Age
0   Tom 20
1   Tom 20
2   Jack    19
3   Terry   18
4   Terry   20

I want to make Tom to be stay same where as Terry to be Terry_1 and Terry_2 and so on if we have more than 1 terry and if age doesn't match. I tried code below.

import pandas as pd
import numpy as np
data = {'Name':['Tom', 'Tom', 'Jack', 'Terry','Terry'], 'Age':[20, 20, 19, 18,20]} 
df = pd.DataFrame(data)
df.Name=df.Name+df.groupby(['Name','Age']).cumcount().astype(str)

What I get is:

Name    Age
0   Tom0    20
1   Tom1    20
2   Jack0   19
3   Terry0  18
4   Terry0  20

What I want is :

Name    Age
0   Tom 20
1   Tom 20
2   Jack    19
3   Terry_1 18
4   Terry_2 20

Since Tom's age is same so we know we have same Tom but Terry is different based on Age.

Thank you, Sam

Upvotes: 3

Views: 238

Answers (2)

Erfan
Erfan

Reputation: 42886

We can if the group has no duplicates and if the size of the group > 1. Then we assign the values with loc and cumsum:

dups = ~df[['Name', 'Age']].duplicated(keep=False)
single = df.groupby('Name')['Age'].transform('size').ne(1)
grps = dups & single
df.loc[grps, 'Name'] = (
    df.loc[grps, 'Name'] + 
    '_' + 
    grps.groupby(df['Name']).cumsum().astype(int).astype(str)
)
      Name  Age
0      Tom   20
1      Tom   20
2     Jack   19
3  Terry_1   18
4  Terry_2   20

Upvotes: 0

BENY
BENY

Reputation: 323226

This can be done by transform and factorize, then we just need mask by the nunique

s=df.groupby('Name').Age
df['New']=(df.Name+'_'+s.transform(lambda x : (x.factorize()[0]+1)).astype(str)).\
              mask(s.transform('nunique')==1,df.Name)
df
Out[230]: 
    Name  Age      New
0    Tom   20      Tom
1    Tom   20      Tom
2   Jack   19     Jack
3  Terry   18  Terry_1
4  Terry   20  Terry_2

Upvotes: 4

Related Questions