Reputation: 886
I would like to subset a pandas dataframe based on a condition which only the first row in the groupby is subjected to.
Dataframe is to be grouped by "name", "driverRef", "tyre", "stint"
For eg, in the df below, because alonso started his stint 2 in position 12, i want to remove all of alonso's records from the df.
name driverRef stint tyre lap pos
0 Australian Grand Prix alonso 1.0 Super soft 1 9
1 Australian Grand Prix alonso 1.0 Super soft 2 9
2 Australian Grand Prix alonso 1.0 Super soft 3 9
3 Australian Grand Prix alonso 2.0 Super soft 20 12
4 Australian Grand Prix alonso 2.0 Super soft 21 11
5 Australian Grand Prix alonso 2.0 Super soft 22 10
Expected output:
name driverRef stint tyre lap pos
0 Australian Grand Prix alonso 1.0 Super soft 2 9
1 Australian Grand Prix alonso 1.0 Super soft 3 9
2 Australian Grand Prix alonso 1.0 Super soft 4 9
I tried this, but it doesn't implemenent the effect correctly:
df.loc[df.groupby(['name', 'driverRef', 'tyre', 'stint']).first().reset_index()['position'].isin(list(range(1,11))).index]
EDIT: My code does work, but please see @jezrael's answer for a more succint/better way of writing.
Upvotes: 1
Views: 41
Reputation: 863166
You are really close, need transform
for return Series with same length as original df
:
s = df.groupby(['name', 'driverRef', 'tyre', 'stint'])['pos'].transform('first')
print (s)
0 9
1 9
2 9
3 12
4 12
5 12
Name: pos, dtype: int64
df = df[s.isin(list(range(1,11)))]
print (df)
name driverRef stint tyre lap pos
0 Australian Grand Prix alonso 1.0 Super soft 1 9
1 Australian Grand Prix alonso 1.0 Super soft 2 9
2 Australian Grand Prix alonso 1.0 Super soft 3 9
Upvotes: 3