Reputation: 2202
I would like to drop all rows in a pandas
dataframe that meet a certain condition except the first one. Note that the rows are not identical, so I cannot use drop_duplicates()
.
For example, if I have the dataframe:
Type Count
A 4
X 33
X 5
E 51
Y 7
and I want to filter on condition: df[df.Type.isin(['X', 'Y'])]
would remove all rows where the type is X
or Y
resulting in:
Type Count
A 4
E 51
but I want to keep the first occurrence that satisfies the condition such that the result is:
Type Count
A 4
X 33
E 51
Any suggestions would be appreciated.
Upvotes: 0
Views: 1221
Reputation: 13387
Try:
import numpy as np
df['k']=np.where(df['Type'].isin(['X', 'Y']), 'x', np.arange(len(df)))
res=df.drop_duplicates(subset='k').drop('k', axis=1)
Outputs:
>>> res
Type Count
0 A 4
1 X 33
3 E 51
Upvotes: 0
Reputation: 42896
We can set the first value to False
with idxmax
:
m = df.Type.isin(['X', 'Y'])
m.loc[m.idxmax()] = False
df[~m]
Type Count
0 A 4
1 X 33
3 E 51
Upvotes: 1