user2974951
user2974951

Reputation: 10385

Pandas group by two and one variables

Consider this toy example

pd.DataFrame([["a",1,1],["a",2,2],["b",1,4],["b",2,8]],columns=["group1","group2","value"])

  group1  group2  value
0      a       1      1
1      a       2      2
2      b       1      4
3      b       2      8

I would like to group by "group1" and "group2" and calculate the mean of "value" for each group, however, I would also like to include the results of grouping only by "group1" (so all group2 rows), which would look something like

  group1  group2  value
0      a       0    1.5
1      a       1    1.0
2      a       2    2.0
3      b       0    6.0
4      b       1    4.0
5      b       2    8.0

How can I achieve this in an efficient way?

Upvotes: 1

Views: 69

Answers (2)

Balaji Ambresh
Balaji Ambresh

Reputation: 5012

Here you go:

g1 = df.groupby('group1', sort=False).value.mean().reset_index()
g1['group2'] = 0
g2 = df.groupby(['group1', 'group2'], sort=False).value.mean().reset_index()
print(pd.concat([g2, g1]).sort_values(['group1', 'group2']).reset_index(drop=True))

Output:

  group1  group2  value
0      a       0    1.5
1      a       1    1.0
2      a       2    2.0
3      b       0    6.0
4      b       1    4.0
5      b       2    8.0

Upvotes: 1

Quang Hoang
Quang Hoang

Reputation: 150785

You can do a concat of the original dataframe and the groupby:

(pd.concat((df,df.groupby('group1')['value'].mean()
                 .reset_index().assign(group2=0)),
          sort=False)
   .sort_values(['group1','group2'])
)

Output:

  group1  group2  value
0      a       0    1.5
0      a       1    1.0
1      a       2    2.0
1      b       0    6.0
2      b       1    4.0
3      b       2    8.0

Upvotes: 1

Related Questions