baxx
baxx

Reputation: 4705

Create group mean columns in pandas dataframe

I'm wondering how best to add columns to a dataframe which contain group means.

Here's a MWE:

import pandas as pd
from io import StringIO

csv = """\
pool,employee,xd1,xd2
1,a,-5.25,-3.92
1,b,-4.25,-3.92
1,c,-4.25,-2.92
1,d,-3.25,-2.92
2,e,-1.25,-0.92
2,f,-1.25,1.08
2,g,-0.25,0.08
2,h,0.75,-0.92
3,i,3.75,3.08
3,j,4.75,2.08
3,k,4.75,4.08
3,l,5.75,5.08
"""
data = pd.read_csv(StringIO(csv))

def group_means(data):
    m1 = data["xd1"].mean()
    m2 = data["xd2"].mean()
    return pd.DataFrame({"x1_mean": [m1] * len(data), "x2_mean": [m2] * len(data)})

pd.concat(
    [data, data.groupby("pool").apply(group_means).reset_index(drop=True)],
    axis="columns",
)

The above will work, but I'm interested to know if there's a better approach.

Upvotes: 1

Views: 87

Answers (2)

Rob Raymond
Rob Raymond

Reputation: 31156

A different way - calc means and then join back. This generates multi-level column index that is reset with second line.

mean = data.groupby("pool").agg(["mean"]).reset_index()
mean.columns = mean.columns.map('|'.join).str.strip('|')
pd.merge(data, mean,on="pool")

Upvotes: 0

Chris Adams
Chris Adams

Reputation: 18647

An alternative method could be to use transform with groupby. Also using join and add_suffix here for desired output:

(data.join(data.groupby('pool')[['xd1', 'xd2']]
           .transform('mean')
           .add_suffix('_mean')))

[out]

    pool employee   xd1   xd2  xd1_mean  xd2_mean
0      1        a -5.25 -3.92     -4.25     -3.42
1      1        b -4.25 -3.92     -4.25     -3.42
2      1        c -4.25 -2.92     -4.25     -3.42
3      1        d -3.25 -2.92     -4.25     -3.42
4      2        e -1.25 -0.92     -0.50     -0.17
5      2        f -1.25  1.08     -0.50     -0.17
6      2        g -0.25  0.08     -0.50     -0.17
7      2        h  0.75 -0.92     -0.50     -0.17
8      3        i  3.75  3.08      4.75      3.58
9      3        j  4.75  2.08      4.75      3.58
10     3        k  4.75  4.08      4.75      3.58
11     3        l  5.75  5.08      4.75      3.58

Upvotes: 1

Related Questions