Aggregation on aggregated values

Question

I have a grouped data frame that I aggregated with count:

counted = df.groupby(['col1','col2']).count()

Now , counted is actually:

col1  col2
1     A    22
      B    21
      C    1
2     G    22
...

Is there any way to calculate std/max of the counted values so that I get:

col1  std
1     1.34
2      3

And so on ..

greg_data · Accepted Answer

All you have to do is call groupby() on counted, but instead of supplying the parameter 'by', supply the parameter level=0. This will return another DataFrameGroupBy object, grouped on the first level of the index (col1 here).

Then you can do aggregations all you like. Here's my example:

#set up a dataframe as in hte provided example
df=pd.DataFrame([[1,'A',22],[1,'B',21],[1,'C',1]],columns=['col1','col2','col3'])
#get the indeces to match the example
counted= df.set_index(['col1','col2'],drop=True,inplace=False)

For simplicity, I've made a df which just mimics the first of your groups:

>>> counted
8:            col3
col1 col2      
1    A       22
     B       21
     C        1

do the groupby operation again, then do aggregations:

>>> counted.groupby(level=0).std()
9:            col3
col1           
1     11.846237
>>> counted.groupby(level=0).max()
10:       col3
col1      
1       22

Aggregation on aggregated values

Answers (1)

Related Questions