Reputation: 165
df:
colA ColB ColC
0 A B 345
1 B C 876
2 D B 983
3 A B 371
4 G B 972
5 H K 193
6 G B 367
7 D J 293
colA ColB ColC
0 A_A B 345
1 B C 876
2 D B 983
3 A_B B 371
4 G_A B 972
5 H K 193
6 G_B B 367
7 D J 293
How to make Column Values unique if it is duplicates, For Example if you see I am checking ColA and ColB, if find any duplicates pair and I am appending _A to first one and _B to second one in ColA. If there are 3 pairs of duplicates then it will Append the _C to to the third Pair.
Upvotes: 1
Views: 527
Reputation: 863791
If there is maximal 26
duplicated values like alphabets create dictionary by enumerate
with string.ascii_uppercase
, select only duplicated rows by DataFrame.duplicated
and add new values created by counter by GroupBy.cumcount
and Series.map
:
import string
d = dict(enumerate(string.ascii_uppercase))
print (len(d))
26
m = df.duplicated(['colA', 'ColB'], keep=False)
df.loc[m, 'colA'] += '_' + df[m].groupby(['colA', 'ColB']).cumcount().map(d)
print (df)
colA ColB ColC
0 A_A B 345
1 B C 876
2 D B 983
3 A_B B 371
4 G_A B 972
5 H K 193
6 G_B B 367
7 D J 293
If possible add numbers instead alphabets is possible solution simplify:
m = df.duplicated(['colA', 'ColB'], keep=False)
df.loc[m, 'colA'] += '_' + df[m].groupby(['colA', 'ColB']).cumcount().astype(str)
print (df)
colA ColB ColC
0 A_0 B 345
1 B C 876
2 D B 983
3 A_1 B 371
4 G_0 B 972
5 H K 193
6 G_1 B 367
7 D J 293
Upvotes: 2