Reputation: 1227
I have a data frame df where some rows are duplicates with respect to a subset of columns:
df
A B C D
1 Blue Green 4
2 Red Green 6
3 Red Green 2
4 Blue Pink 6
5 Blue Orange 9
6 Blue Orange 8
7 Blue Red 8
8 Red Orange 9
I would like to replace values for duplicate rows with respect to B and C and replace the whole row by 'ERR', ideally producing:
A B C D
1 Blue Green 4
ERR ERR ERR ERR
ERR ERR ERR ERR
4 Blue Pink 6
ERR ERR ERR ERR
ERR ERR ERR ERR
7 Blue Red 8
8 Red Orange 9
So brief: If there are duplicate rows for columns B and C, all the values in those rows should be set to 'ERR' (not only the duplicate ones).
Solved! -> thanks to @anky_91
df = pd.DataFrame({"A": [1,2,3,4,5,6,7,8], "B": ['Blue', 'Red', 'Red', 'Blue', 'Blue', 'Blue', 'Blue', 'Red'], "C": ['Green', 'Green', 'Green', 'Pink', 'Orange', 'Orange', 'Red', 'Orange'], "D": [4,6,2,6,9,8,8,9]})
df = df.mask(df.duplicated(['B','C'], keep=False), 'ERR')
print(df)
Upvotes: 0
Views: 807
Reputation: 75080
You can use df.mask
here with df.duplicated
df.mask(df.duplicated(['B','C'],keep=False),'ERR')
A B C D
0 1 Blue Green 4
1 ERR ERR ERR ERR
2 ERR ERR ERR ERR
3 4 Blue Pink 6
4 ERR ERR ERR ERR
5 ERR ERR ERR ERR
6 7 Blue Red 8
7 8 Red Orange 9
Upvotes: 1