harry
harry

Reputation: 165

How to Make column Duplicate Values to Unique?

df:

 colA   ColB   ColC
0  A      B      345
1  B      C      876
2  D      B      983
3  A      B      371
4  G      B      972
5  H      K      193
6  G      B      367
7  D      J      293


   colA  ColB   ColC
0  A_A    B      345
1  B      C      876
2  D      B      983
3  A_B    B      371
4  G_A    B      972
5  H      K      193
6  G_B    B      367
7  D      J      293

How to make Column Values unique if it is duplicates, For Example if you see I am checking ColA and ColB, if find any duplicates pair and I am appending _A to first one and _B to second one in ColA. If there are 3 pairs of duplicates then it will Append the _C to to the third Pair.

Upvotes: 1

Views: 527

Answers (1)

jezrael
jezrael

Reputation: 863791

If there is maximal 26 duplicated values like alphabets create dictionary by enumerate with string.ascii_uppercase, select only duplicated rows by DataFrame.duplicated and add new values created by counter by GroupBy.cumcount and Series.map:

import string

d = dict(enumerate(string.ascii_uppercase))

print (len(d))
26

m = df.duplicated(['colA', 'ColB'], keep=False)
df.loc[m, 'colA'] += '_' + df[m].groupby(['colA', 'ColB']).cumcount().map(d)
print (df)
  colA ColB  ColC
0  A_A    B   345
1    B    C   876
2    D    B   983
3  A_B    B   371
4  G_A    B   972
5    H    K   193
6  G_B    B   367
7    D    J   293

If possible add numbers instead alphabets is possible solution simplify:

m = df.duplicated(['colA', 'ColB'], keep=False)
df.loc[m, 'colA'] += '_' + df[m].groupby(['colA', 'ColB']).cumcount().astype(str)
print (df)
  colA ColB  ColC
0  A_0    B   345
1    B    C   876
2    D    B   983
3  A_1    B   371
4  G_0    B   972
5    H    K   193
6  G_1    B   367
7    D    J   293

Upvotes: 2

Related Questions