Reputation: 198
I have table where i want to remove duplicate values from particular column and want single value for those duplicate values.
Input table:
col1 col2 col3 col4
afsdd asdf fff d
b f m d
dfdf d dd d
df ff f g
dd dl jf g
ddd dll dfd h
Desired output:
col1 col2 col3 col4
afsdd asdf fff d
b f m NA(empty)
dfdf d dd NA(empty)
df ff f g
dd dl jf NA(empty)
ddd dll dfd h
Here i want to remove duplicate column values by deleting or removing all the column values except first value and the duplicate column value should be removed in the sense that column should become empty.I don't want to drop entire row.
The below code i have tried:
df_final.drop_duplicates(subset ="col4")
but its removing all duplicates along with rows, but i want result in above mentioned format.
Upvotes: 0
Views: 595
Reputation: 42916
This is Series.mask
with Series.duplicated
:
df_final['col4'] = df_final['col4'].mask(df_final['col4'].duplicated())
col1 col2 col3 col4
0 afsdd asdf fff d
1 b f m NaN
2 dfdf d dd NaN
3 df ff f g
4 dd dl jf NaN
5 ddd dll dfd h
Upvotes: 3