Reputation: 1246
I am trying to make a column unique if they are duplicate and if another column values are duplicate. In this dataset.
Name Age
0 Tom 20
1 Tom 20
2 Jack 19
3 Terry 18
4 Terry 20
I want to make Tom to be stay same where as Terry to be Terry_1 and Terry_2 and so on if we have more than 1 terry and if age doesn't match. I tried code below.
import pandas as pd
import numpy as np
data = {'Name':['Tom', 'Tom', 'Jack', 'Terry','Terry'], 'Age':[20, 20, 19, 18,20]}
df = pd.DataFrame(data)
df.Name=df.Name+df.groupby(['Name','Age']).cumcount().astype(str)
What I get is:
Name Age
0 Tom0 20
1 Tom1 20
2 Jack0 19
3 Terry0 18
4 Terry0 20
What I want is :
Name Age
0 Tom 20
1 Tom 20
2 Jack 19
3 Terry_1 18
4 Terry_2 20
Since Tom's age is same so we know we have same Tom but Terry is different based on Age.
Thank you, Sam
Upvotes: 3
Views: 238
Reputation: 42886
We can if the group has no duplicates and if the size of the group > 1
. Then we assign the values with loc
and cumsum
:
dups = ~df[['Name', 'Age']].duplicated(keep=False)
single = df.groupby('Name')['Age'].transform('size').ne(1)
grps = dups & single
df.loc[grps, 'Name'] = (
df.loc[grps, 'Name'] +
'_' +
grps.groupby(df['Name']).cumsum().astype(int).astype(str)
)
Name Age
0 Tom 20
1 Tom 20
2 Jack 19
3 Terry_1 18
4 Terry_2 20
Upvotes: 0
Reputation: 323226
This can be done by transform
and factorize
, then we just need mask by the nunique
s=df.groupby('Name').Age
df['New']=(df.Name+'_'+s.transform(lambda x : (x.factorize()[0]+1)).astype(str)).\
mask(s.transform('nunique')==1,df.Name)
df
Out[230]:
Name Age New
0 Tom 20 Tom
1 Tom 20 Tom
2 Jack 19 Jack
3 Terry 18 Terry_1
4 Terry 20 Terry_2
Upvotes: 4