Reputation: 1417

Drop row based on condition

I have this DataFrame:

pd.DataFrame(
    {'name': ['Adam', 'Adam', 'Adam', 'Bill', 'Bill', 'Charlie', 'Charlie', 'Charlie', 'Charlie'],
     'message': ['start', 'stuck', 'finish', 'start', 'stuck', 'start', 'stuck', 'finish', 'finish']}
)

and I want to drop all rows with message "stuck" from all rows that don't have a message "finish":

pd.DataFrame(
    {'name': ['Adam', 'Adam', 'Bill', 'Bill', 'Charlie', 'Charlie', 'Charlie'],
     'message': ['start', 'finish', 'start', 'stuck', 'start', 'finish', 'finish']}
)

So Bill never "finished", so his message will remain "stuck".

Upvotes: 0

Answers (2)

j__carlson

Reputation: 1348

This will work:

df[~((df.name.isin(df[df.message=="finish"]['name'])) & (df.message=='stuck'))]

Output:

name	message
Adam	start
Adam	finish
Bill	start
Bill	stuck
Charlie	start
Charlie	finish
Charlie	finish

Upvotes: 1

Cimbali

Reputation: 11395

To get if any student has finished, group by student and use any, here we want it back in the original shape of the dataframe so we use groupby.transform:

>>> sf = df['message'].eq('finish').groupby(df['name']).transform('any')
>>> sf
0     True
1     True
2     True
3    False
4    False
5     True
6     True
7     True
8     True
Name: message, dtype: bool

From there it’s easy to remove messages that are stuck from students that have not finished yet:

>>> df[~sf | df['message'].ne('stuck'))]
      name message
0     Adam   start
2     Adam  finish
3     Bill   start
4     Bill   stuck
5  Charlie   start
7  Charlie  finish
8  Charlie  finish

Upvotes: 1

Drop row based on condition

Answers (2)

Related Questions