Reputation: 4705
I'm wondering how best to add columns to a dataframe which contain group means.
Here's a MWE:
import pandas as pd
from io import StringIO
csv = """\
pool,employee,xd1,xd2
1,a,-5.25,-3.92
1,b,-4.25,-3.92
1,c,-4.25,-2.92
1,d,-3.25,-2.92
2,e,-1.25,-0.92
2,f,-1.25,1.08
2,g,-0.25,0.08
2,h,0.75,-0.92
3,i,3.75,3.08
3,j,4.75,2.08
3,k,4.75,4.08
3,l,5.75,5.08
"""
data = pd.read_csv(StringIO(csv))
def group_means(data):
m1 = data["xd1"].mean()
m2 = data["xd2"].mean()
return pd.DataFrame({"x1_mean": [m1] * len(data), "x2_mean": [m2] * len(data)})
pd.concat(
[data, data.groupby("pool").apply(group_means).reset_index(drop=True)],
axis="columns",
)
The above will work, but I'm interested to know if there's a better approach.
Upvotes: 1
Views: 87
Reputation: 31156
A different way - calc means and then join back. This generates multi-level column index that is reset with second line.
mean = data.groupby("pool").agg(["mean"]).reset_index()
mean.columns = mean.columns.map('|'.join).str.strip('|')
pd.merge(data, mean,on="pool")
Upvotes: 0
Reputation: 18647
An alternative method could be to use transform
with groupby
. Also using join
and add_suffix
here for desired output:
(data.join(data.groupby('pool')[['xd1', 'xd2']]
.transform('mean')
.add_suffix('_mean')))
[out]
pool employee xd1 xd2 xd1_mean xd2_mean
0 1 a -5.25 -3.92 -4.25 -3.42
1 1 b -4.25 -3.92 -4.25 -3.42
2 1 c -4.25 -2.92 -4.25 -3.42
3 1 d -3.25 -2.92 -4.25 -3.42
4 2 e -1.25 -0.92 -0.50 -0.17
5 2 f -1.25 1.08 -0.50 -0.17
6 2 g -0.25 0.08 -0.50 -0.17
7 2 h 0.75 -0.92 -0.50 -0.17
8 3 i 3.75 3.08 4.75 3.58
9 3 j 4.75 2.08 4.75 3.58
10 3 k 4.75 4.08 4.75 3.58
11 3 l 5.75 5.08 4.75 3.58
Upvotes: 1