How do I replace duplicate value in a column to make it unique based on another column in Pandas?

Question

I am trying to make a column unique if they are duplicate and if another column values are duplicate. In this dataset.

    Name    Age
0   Tom 20
1   Tom 20
2   Jack    19
3   Terry   18
4   Terry   20

I want to make Tom to be stay same where as Terry to be Terry_1 and Terry_2 and so on if we have more than 1 terry and if age doesn't match. I tried code below.

import pandas as pd
import numpy as np
data = {'Name':['Tom', 'Tom', 'Jack', 'Terry','Terry'], 'Age':[20, 20, 19, 18,20]} 
df = pd.DataFrame(data)
df.Name=df.Name+df.groupby(['Name','Age']).cumcount().astype(str)

What I get is:

Name    Age
0   Tom0    20
1   Tom1    20
2   Jack0   19
3   Terry0  18
4   Terry0  20

What I want is :

Name    Age
0   Tom 20
1   Tom 20
2   Jack    19
3   Terry_1 18
4   Terry_2 20

Since Tom's age is same so we know we have same Tom but Terry is different based on Age.

Thank you, Sam

BENY · Accepted Answer

This can be done by transform and factorize, then we just need mask by the nunique

s=df.groupby('Name').Age
df['New']=(df.Name+'_'+s.transform(lambda x : (x.factorize()[0]+1)).astype(str)).\
              mask(s.transform('nunique')==1,df.Name)
df
Out[230]: 
    Name  Age      New
0    Tom   20      Tom
1    Tom   20      Tom
2   Jack   19     Jack
3  Terry   18  Terry_1
4  Terry   20  Terry_2

How do I replace duplicate value in a column to make it unique based on another column in Pandas?

Answers (2)

Related Questions