Pandas group by two and one variables

Question

Consider this toy example

pd.DataFrame([["a",1,1],["a",2,2],["b",1,4],["b",2,8]],columns=["group1","group2","value"])

  group1  group2  value
0      a       1      1
1      a       2      2
2      b       1      4
3      b       2      8

I would like to group by "group1" and "group2" and calculate the mean of "value" for each group, however, I would also like to include the results of grouping only by "group1" (so all group2 rows), which would look something like

  group1  group2  value
0      a       0    1.5
1      a       1    1.0
2      a       2    2.0
3      b       0    6.0
4      b       1    4.0
5      b       2    8.0

How can I achieve this in an efficient way?

Balaji Ambresh · Accepted Answer

Here you go:

g1 = df.groupby('group1', sort=False).value.mean().reset_index()
g1['group2'] = 0
g2 = df.groupby(['group1', 'group2'], sort=False).value.mean().reset_index()
print(pd.concat([g2, g1]).sort_values(['group1', 'group2']).reset_index(drop=True))

Output:

  group1  group2  value
0      a       0    1.5
1      a       1    1.0
2      a       2    2.0
3      b       0    6.0
4      b       1    4.0
5      b       2    8.0

Pandas group by two and one variables

Answers (2)

Related Questions