Elliot.L
Elliot.L

Reputation: 11

Creating pandas aggregate column based on another column

I'm practicing stats and pandas and I want to create a third column called 'Mean' which works out the means of each grouped location so that each separate row can then perform calculations with it. Here's what I'm starting with:

import pandas as pd

df = pd.DataFrame({'Location': ['Alaska', 'Alaska', 'Amsterdam', 'Amsterdam',
                                'Arkansas', 'Arkansas'],
                   'Number': ['300', '500', '250', '600', '400', '150']}

Here's what I want it to look like:

Dataframe

The real_deviations column is to give an idea of what I'd be using it for. I figured it out using a rather long-winded way of groupby and merges but am stuck on trying to find a more streamlined solution.

Any help would be appreciated!

Thanks!

Upvotes: 1

Views: 246

Answers (2)

xyzjayne
xyzjayne

Reputation: 1387

To generate the location-specific mean, you need to do a groupby transform:

df['mean'] = df.groupby('Location')['Number'].transform('mean')

Then, you can proceed to calculate the deviations as:

df['real_deviations'] = df['Number'] - df['mean']

PS: be sure to convert your numbers in "Number" from str to int beforehand. A simple df['Number'] = df['Number'].astype(int) will do.

Upvotes: 3

Rob Raymond
Rob Raymond

Reputation: 31166

The core capability you need is transform

  • calc mean of group per row
  • calc additional columns using this
df = pd.DataFrame({'Location': ['Alaska', 'Alaska', 'Amsterdam', 'Amsterdam',
                                'Arkansas', 'Arkansas'],
                   'Number': ['300', '500', '250', '600', '400', '150']})
df["Number"] = df["Number"].astype(int)

df.assign(means=df.groupby("Location", as_index=False).transform("mean"),
          real_deviation=lambda d: d["Number"] - d["means"])

Location Number means real_deviation
0 Alaska 300 400 -100
1 Alaska 500 400 100
2 Amsterdam 250 425 -175
3 Amsterdam 600 425 175
4 Arkansas 400 275 125
5 Arkansas 150 275 -125

Upvotes: 0

Related Questions