Reputation: 11660
Using the example here Drop all duplicate rows in Python Pandas
Lets say I don't want to drop the duplicates but change the value of the data in one of the columns in the subset.
So as per the example, if we use subset=['A','C'] to identify duplicates then I want to change row 1 column 'A' from foo to foo1.
I have a complicated way of doing this but there must be a more simple way that takes advantage of vectorization/built-in features.
Original df:
A B C
0 foo 0 A
1 foo 1 A
2 foo 1 B
3 bar 1 A
Desired df:
A B C
0 foo 0 A
1 foo1 1 A
2 foo 1 B
3 bar 1 A
Upvotes: 2
Views: 3944
Reputation: 353059
You could use cumcount
and do something like
>>> c = df.groupby(["A","C"]).cumcount()
>>> c = c.replace(0, '').astype(str)
>>> df["A"] += c
>>> df
A B C
0 foo 0 A
1 foo1 1 A
2 foo 1 B
3 bar 1 A
This works because the cumcount
gives us
>>> df.groupby(["A","C"]).cumcount()
0 0
1 1
2 0
3 0
dtype: int64
Upvotes: 3