Reputation: 2273

replace duplicate values after first ocurrence in a column mantaining the rows

I have the following df with lots of rows:

    xx   yy   zz
A   5    4   'd.1'
B   2    2   'd.1'
C   1    1   'e.1'
D   2    2   'e.2'
E   1    5   'e.2'
.

I would like to remove all the duplicate values (keeping the rows) after the first value in column zz in order to obtain the following output:

    xx   yy   zz
A   5    4   'd.1'
B   2    2   '0.0'   
C   1    1   'e.1'
D   2    2   'e.2'
E   1    5   '0.0'

How could I get this done? . .

Upvotes: 1

Answers (4)

Reputation: 1185

You may use:

is_duplicate = df.apply(df['zz'].duplicated(), axis=1)
df.where(~is_duplicate, '0.0')

Upvotes: 0

Reputation: 1406

Maybe this is the pandas way to do that.

df.loc[df.zz == df.zz.shift(), 'zz'] = 0

Upvotes: 1

Reputation: 8033

IIUC this is what you need.

df['zz']=np.where(df['zz'].duplicated(), '0.0',df['zz'])

Upvotes: 2

Reputation: 2405

There is special function to do that drop_duplicates

df = df.drop_duplicates(subset='zz', keep='first')

Update: Do you need to drop duplicates in column zz only?

df.zz.loc[df.zz == df.zz.shift()] = '0.0'

Upvotes: 1