Reputation: 2273
I have the following df with lots of rows:
xx yy zz
A 5 4 'd.1'
B 2 2 'd.1'
C 1 1 'e.1'
D 2 2 'e.2'
E 1 5 'e.2'
.
I would like to remove all the duplicate values (keeping the rows) after the first value in column zz in order to obtain the following output:
xx yy zz
A 5 4 'd.1'
B 2 2 '0.0'
C 1 1 'e.1'
D 2 2 'e.2'
E 1 5 '0.0'
How could I get this done? . .
Upvotes: 1
Views: 47
Reputation: 1185
You may use:
is_duplicate = df.apply(df['zz'].duplicated(), axis=1)
df.where(~is_duplicate, '0.0')
Upvotes: 0
Reputation: 1406
Maybe this is the pandas way to do that.
df.loc[df.zz == df.zz.shift(), 'zz'] = 0
Upvotes: 1
Reputation: 8033
IIUC this is what you need.
df['zz']=np.where(df['zz'].duplicated(), '0.0',df['zz'])
Upvotes: 2
Reputation: 2405
There is special function to do that drop_duplicates
df = df.drop_duplicates(subset='zz', keep='first')
Update: Do you need to drop duplicates in column zz
only?
df.zz.loc[df.zz == df.zz.shift()] = '0.0'
Upvotes: 1