Reputation: 1199
I have pandas dataframe
like this (10 columns and thousands of rows):
col1 col2 col3 col4 col5 col6 col7 col8 col9 col10
a c a d b f a c g b
b g i k l r k b m l
It has duplicates. i want to replace duplicates with (0) across the rows and keep only the first occurrence. So, the result would be like this:
col1 col2 col3 col4 col5 col6 col7 col8 col9 col10
a c 0 d b f 0 0 g 0
0 0 i k l r 0 0 m 0
The values in the dataframe have type "object
". I am using python 3.5.
I found a solution in Replace duplicate values across columns in Pandas but it didn't work for me. No replacements happened.
Any other ideas to solve this issue?
Upvotes: 1
Views: 540
Reputation: 150825
You can stack
to get a series, check that series for duplicated
, unstack
the series and mask:
df.mask(df.stack().duplicated().unstack(),0)
output:
col1 col2 col3 col4 col5 col6 col7 col8 col9 col10
0 a c 0 d b f 0 0 g 0
1 0 0 i k l r 0 0 m 0
Upvotes: 3
Reputation: 7614
Why didn't this work?
is_duplicate = df.apply(pd.Series.duplicated, axis=1)
print(df.where(~is_duplicate, 0))
Works fine for me:
col1 col2 col3 col4 col5 col6 col7 col8 col9 col10
0 a c 0 d b f 0 0 g 0
1 b g i k l r 0 0 m 0
Upvotes: 0