Python2.7: Subset dataframe based on condition in first row of groupby

Question

I would like to subset a pandas dataframe based on a condition which only the first row in the groupby is subjected to.

Dataframe is to be grouped by "name", "driverRef", "tyre", "stint"

For eg, in the df below, because alonso started his stint 2 in position 12, i want to remove all of alonso's records from the df.

    name                   driverRef stint  tyre      lap   pos     
0   Australian Grand Prix   alonso  1.0     Super soft  1   9        
1   Australian Grand Prix   alonso  1.0     Super soft  2   9        
2   Australian Grand Prix   alonso  1.0     Super soft  3   9       
3   Australian Grand Prix   alonso  2.0     Super soft  20   12        
4   Australian Grand Prix   alonso  2.0     Super soft  21   11     
5   Australian Grand Prix   alonso  2.0     Super soft  22   10

Expected output:

    name                   driverRef stint  tyre      lap   pos     
0   Australian Grand Prix   alonso  1.0     Super soft  2   9        
1   Australian Grand Prix   alonso  1.0     Super soft  3   9        
2   Australian Grand Prix   alonso  1.0     Super soft  4   9

I tried this, but it doesn't implemenent the effect correctly:

df.loc[df.groupby(['name', 'driverRef', 'tyre', 'stint']).first().reset_index()['position'].isin(list(range(1,11))).index]

EDIT: My code does work, but please see @jezrael's answer for a more succint/better way of writing.

jezrael · Accepted Answer

You are really close, need transform for return Series with same length as original df:

s = df.groupby(['name', 'driverRef', 'tyre', 'stint'])['pos'].transform('first')
print (s)
0     9
1     9
2     9
3    12
4    12
5    12
Name: pos, dtype: int64

df = df[s.isin(list(range(1,11)))]
print (df)
                    name driverRef  stint        tyre  lap  pos
0  Australian Grand Prix    alonso    1.0  Super soft    1    9
1  Australian Grand Prix    alonso    1.0  Super soft    2    9
2  Australian Grand Prix    alonso    1.0  Super soft    3    9

Python2.7: Subset dataframe based on condition in first row of groupby

Answers (1)

Related Questions