zainy
zainy

Reputation: 111

Is it possible to do a groupby on the result of a groupby?

I don't think I need to share the entire dataframe, but basically, this is the line of code in question (with pandas already imported, of course)

divstack = df[df['Competitor']=='Emma Slabach'].groupby(['Division','Stack'])['Time'].min()

The output is :

>>> divstack
Division  Stack 
6U F      3/3/03     2.66
          3/6/03     4.81
          Cycle     13.89
7-8 F     3/3/03     2.41
          3/6/03     2.68
          Cycle      7.71
9-10 F    3/3/03     2.13
          3/6/03     2.75
          Cycle      6.94
Name: Time, dtype: float64

I already grabbed Emma's fastest time is 2.13, thanks to this line of code:

emma = df[df['Competitor']=='Emma Slabach'].groupby(['Competitor'])['Time'].min()

and the output is:

>>> emma
Competitor
Emma Slabach    2.13 
Name: Time, dtype: float64

But how would I go about modifying the first line of code from earlier to specifically obtain the Division and Stack (along with Time) of when her fastest time occurred? (Division 9-10F and Stack 3/3/03).

I don't think a function is necessary, but is there a way I can perform another groupby on top of that first groupby output (divstack) I got, to further "minimize" and get her fastest time? Or could I input emma somewhere in divstack to obtain which division/stack that time occurs?

I need to store the division, stack, and time into divstack

Upvotes: 2

Views: 64

Answers (2)

andrew_reece
andrew_reece

Reputation: 21274

Given divstack, you can retrieve the full MultiIndex entry with .loc and min():

divstack.loc[divstack.eq(divstack.min())]

Division  Stack 
9-10 F    3/3/03    2.13
Name: Time, dtype: float64

Upvotes: 2

louis_guitton
louis_guitton

Reputation: 5727

I think you're looking for the idxmin function https://pandas.pydata.org/pandas-docs/version/0.17.0/generated/pandas.core.groupby.DataFrameGroupBy.idxmin.html

result = df.loc[df.groupby('Competitor').Time.idxmin()]

should give you what you want for every competitor. Just filter then for Emma if you need: result.loc[result.Competitor == 'Emma Slabach']

In [6]: df = pd.DataFrame([['Emma Slabach', '6U F',  '3/3/03', 2.66], ['Emma Slabach', '7-8 F', '3/3/03', 2.41], ['Roger', '6U F', '3/3/03', 3.80]], columns=['Competitor', 'Div
   ...: ision', 'Stack', 'Time'])

In [7]: df
Out[7]: 
     Competitor Division   Stack  Time
0  Emma Slabach     6U F  3/3/03  2.66
1  Emma Slabach    7-8 F  3/3/03  2.41
2         Roger     6U F  3/3/03  3.80

In [8]: df.loc[df.groupby('Competitor').Time.idxmin()]
Out[8]: 
     Competitor Division   Stack  Time
1  Emma Slabach    7-8 F  3/3/03  2.41
2         Roger     6U F  3/3/03  3.80

Upvotes: 1

Related Questions