Reputation: 886
I have a dataframe and I would like to filter the dataframe further to only include a group whose rows do not have a certain value in a column
For eg, in the dataframe, since hamilton has an overtake in lap3 of his stint 1, I want to remove ALL of hamilton's stint 1 laptime records from the dataframe below.
I thought of doing a groupby and then a get group,iterate through each row in the group, detect non-null value in the "clear lap?" column, and label "yes" in a new column for all rows in the groupby, then filter out the group.
Is there a faster way of subsetting the dataframe?
Dataframe:
name driverRef stint tyre lap pos clear lap?
0 Australian Grand Prix vettel 1.0 Super soft 2 1 NaN
1 Australian Grand Prix vettel 1.0 Super soft 3 1 NaN
2 Australian Grand Prix vettel 1.0 Super soft 4 1 NaN
3 Australian Grand Prix ham 1.0 Super soft 2 3 NaN
4 Australian Grand Prix ham 1.0 Super soft 3 2 overtook
5 Australian Grand Prix ham 1.0 Super soft 4 2 NaN
Upvotes: 1
Views: 248
Reputation: 863166
I believe you need get all groups by filtering and then filter again by isin
:
Notice: Thank you, @Vivek Kalyanarangan for improvement by unique
.
a = df.loc[df['clear lap?'].notnull(), 'driverRef'].unique()
print (a)
['ham']
df = df[~df['driverRef'].isin(a)]
print (df)
name driverRef stint tyre lap pos clear lap?
0 Australian Grand Prix vettel 1.0 Super soft 2 1 NaN
1 Australian Grand Prix vettel 1.0 Super soft 3 1 NaN
2 Australian Grand Prix vettel 1.0 Super soft 4 1 NaN
Another solution, slowier:
df = df[df['clear lap?'].isnull().groupby(df['driverRef']).transform('all')]
Or slowiest:
df = df.groupby('driverRef').filter(lambda x: x['clear lap?'].isnull().all())
Upvotes: 1