user3420561
user3420561

Reputation: 198

How to drop duplicate values of columns in excel using pandas with particular condition

I have table where i want to remove duplicate values from particular column and want single value for those duplicate values.

Input table:

col1  col2 col3  col4

afsdd  asdf  fff   d

b       f    m     d

dfdf    d    dd    d

df      ff    f    g

dd      dl    jf   g

ddd     dll   dfd  h

Desired output:

col1  col2 col3  col4

afsdd  asdf  fff   d

b       f    m     NA(empty)

dfdf    d    dd    NA(empty)

df      ff    f    g

dd      dl    jf   NA(empty)

ddd     dll   dfd  h

Here i want to remove duplicate column values by deleting or removing all the column values except first value and the duplicate column value should be removed in the sense that column should become empty.I don't want to drop entire row.

The below code i have tried:

df_final.drop_duplicates(subset ="col4")

but its removing all duplicates along with rows, but i want result in above mentioned format.

Upvotes: 0

Views: 595

Answers (1)

Erfan
Erfan

Reputation: 42916

This is Series.mask with Series.duplicated:

df_final['col4'] = df_final['col4'].mask(df_final['col4'].duplicated())

    col1  col2 col3 col4
0  afsdd  asdf  fff    d
1      b     f    m  NaN
2   dfdf     d   dd  NaN
3     df    ff    f    g
4     dd    dl   jf  NaN
5    ddd   dll  dfd    h

Upvotes: 3

Related Questions