Pandas data frame mean by variables

Question

I have a data frame

a = pd.DataFrame({'a':[1,2,3,4], 'b':[1,1,2,2], 'c':[1,1,1,2]})
>>> a
   a  b  c
0  1  1  1
1  2  1  1
2  3  2  1
3  4  2  2

I would like to compute the mean of a once that it has been grouped according to the value of b an c.

So i should split the data in 3 groups:

b=1,c=1     
b=1,c=2
b=2,c=2

and then compute the mean of a in each group.

How can I do that? I suspect that I have to use groupby but I do not understand how.

EdChum · Accepted Answer

You can groupby multiple columns by passing a list of the column names, then it's just a simple case of calling mean on the gorupby object:

In [4]:

a.groupby(['b','c']).mean()

Out[4]:
       a
b c     
1 1  1.5
2 1  3.0
  2  4.0

If you want to restore the columns that were grouped by back as columns, just call reset_index():

In [5]:

a.groupby(['b','c']).mean().reset_index()

Out[5]:
   b  c    a
0  1  1  1.5
1  2  1  3.0
2  2  2  4.0

Answers (1)