Taie
Taie

Reputation: 1199

Replace duplicated values across different columns in pandas with '0'

I have pandas dataframe like this (10 columns and thousands of rows):

    col1 col2 col3 col4 col5 col6 col7 col8 col9 col10
     a    c     a    d    b    f    a    c    g     b
     b    g     i    k    l    r    k    b    m     l

It has duplicates. i want to replace duplicates with (0) across the rows and keep only the first occurrence. So, the result would be like this:

    col1 col2 col3 col4 col5 col6 col7 col8 col9 col10
     a    c     0    d    b    f    0    0    g     0
     0    0     i    k    l    r    0    0    m     0

The values in the dataframe have type "object". I am using python 3.5.

I found a solution in Replace duplicate values across columns in Pandas but it didn't work for me. No replacements happened.

Any other ideas to solve this issue?

Upvotes: 1

Views: 540

Answers (2)

Quang Hoang
Quang Hoang

Reputation: 150825

You can stack to get a series, check that series for duplicated, unstack the series and mask:

df.mask(df.stack().duplicated().unstack(),0)

output:

  col1 col2 col3 col4 col5 col6 col7 col8 col9 col10
0    a    c    0    d    b    f    0    0    g     0
1    0    0    i    k    l    r    0    0    m     0

Upvotes: 3

NYC Coder
NYC Coder

Reputation: 7614

Why didn't this work?

is_duplicate = df.apply(pd.Series.duplicated, axis=1)
print(df.where(~is_duplicate, 0))

Works fine for me:

  col1 col2 col3 col4 col5 col6 col7 col8 col9 col10
0    a    c    0    d    b    f    0    0    g     0
1    b    g    i    k    l    r    0    0    m     0

Upvotes: 0

Related Questions