How do I aggregate sub-dataframes in pandas?

Question

Suppose I have two-leveled multi-indexed dataframe

In [1]: index = pd.MultiIndex.from_tuples([(i,j)  for i in range(3)
      :                                           for j in range(1+i)], names=list('ij') )
      : df = pd.DataFrame(0.1*np.arange(2*len(index)).reshape(-1,2),
      :                   columns=list('xy'), index=index )
      : df
Out[1]:
      x    y
i j
0 0  0.0  0.1
1 0  0.2  0.3
  1  0.4  0.5
2 0  0.6  0.7
  1  0.8  0.9
  2  1.0  1.1

And I want to run a custom function on every sub-dataframe:

In [2]: def my_aggr_func(subdf):
      :     return subdf['x'].mean() / subdf['y'].mean()
      :
      : level0 = df.index.levels[0].values
      : pd.DataFrame({'mean_ratio': [my_aggr_func(df.loc[i]) for i in level0]},
      :              index=pd.Index(level0, name=index.names[0]) )
Out[2]:
     mean_ratio
i
0    0.000000
1    0.750000
2    0.888889

Is there an elegant way to do it with df.groupby('i').agg(__something__) or something similar?

jezrael · Accepted Answer

Need GroupBy.apply, which working with DataFrame:

df1 = df.groupby('i').apply(my_aggr_func).to_frame('mean_ratio')
print (df1)
   mean_ratio
i            
0    0.000000
1    0.750000
2    0.888889

How do I aggregate sub-dataframes in pandas?

Answers (2)

Related Questions