Reputation: 10385
Consider this toy example
pd.DataFrame([["a",1,1],["a",2,2],["b",1,4],["b",2,8]],columns=["group1","group2","value"])
group1 group2 value
0 a 1 1
1 a 2 2
2 b 1 4
3 b 2 8
I would like to group by "group1" and "group2" and calculate the mean of "value" for each group, however, I would also like to include the results of grouping only by "group1" (so all group2 rows), which would look something like
group1 group2 value
0 a 0 1.5
1 a 1 1.0
2 a 2 2.0
3 b 0 6.0
4 b 1 4.0
5 b 2 8.0
How can I achieve this in an efficient way?
Upvotes: 1
Views: 69
Reputation: 5012
Here you go:
g1 = df.groupby('group1', sort=False).value.mean().reset_index()
g1['group2'] = 0
g2 = df.groupby(['group1', 'group2'], sort=False).value.mean().reset_index()
print(pd.concat([g2, g1]).sort_values(['group1', 'group2']).reset_index(drop=True))
Output:
group1 group2 value
0 a 0 1.5
1 a 1 1.0
2 a 2 2.0
3 b 0 6.0
4 b 1 4.0
5 b 2 8.0
Upvotes: 1
Reputation: 150785
You can do a concat
of the original dataframe and the groupby:
(pd.concat((df,df.groupby('group1')['value'].mean()
.reset_index().assign(group2=0)),
sort=False)
.sort_values(['group1','group2'])
)
Output:
group1 group2 value
0 a 0 1.5
0 a 1 1.0
1 a 2 2.0
1 b 0 6.0
2 b 1 4.0
3 b 2 8.0
Upvotes: 1