Reputation: 1417
I have this DataFrame:
pd.DataFrame(
{'name': ['Adam', 'Adam', 'Adam', 'Bill', 'Bill', 'Charlie', 'Charlie', 'Charlie', 'Charlie'],
'message': ['start', 'stuck', 'finish', 'start', 'stuck', 'start', 'stuck', 'finish', 'finish']}
)
and I want to drop all rows with message "stuck" from all rows that don't have a message "finish":
pd.DataFrame(
{'name': ['Adam', 'Adam', 'Bill', 'Bill', 'Charlie', 'Charlie', 'Charlie'],
'message': ['start', 'finish', 'start', 'stuck', 'start', 'finish', 'finish']}
)
So Bill never "finished", so his message will remain "stuck".
Upvotes: 0
Views: 47
Reputation: 1348
This will work:
df[~((df.name.isin(df[df.message=="finish"]['name'])) & (df.message=='stuck'))]
Output:
name | message |
---|---|
Adam | start |
Adam | finish |
Bill | start |
Bill | stuck |
Charlie | start |
Charlie | finish |
Charlie | finish |
Upvotes: 1
Reputation: 11395
To get if any student has finished, group by student
and use any
, here we want it back in the original shape of the dataframe so we use groupby.transform
:
>>> sf = df['message'].eq('finish').groupby(df['name']).transform('any')
>>> sf
0 True
1 True
2 True
3 False
4 False
5 True
6 True
7 True
8 True
Name: message, dtype: bool
From there it’s easy to remove messages that are stuck from students that have not finished yet:
>>> df[~sf | df['message'].ne('stuck'))]
name message
0 Adam start
2 Adam finish
3 Bill start
4 Bill stuck
5 Charlie start
7 Charlie finish
8 Charlie finish
Upvotes: 1