Ignacio Peralta
Ignacio Peralta

Reputation: 23

Add a column with mean values for groups based on another column

So, basically I have a df that has in one column the different areas for each instance and in another two columns the amount they produce.

Something like this:

entity area prod_a prod_b
001 A 1 3
002 B 2 4
003 A 2 6
004 C 7 2
005 C 4 8
006 B 6 5

I want to add two new columns with the total mean for each king of production (prod_a and prod_b) discriminated by area.

For example:

entity area prod_a prod_b mean prod_a for the area mean prod_b for the area
001 A 1 3 1.5 4.5
002 B 2 4 4 2.5
003 A 2 6 1.5 4.5
004 C 7 2 5.5 5
005 C 4 8 5.5 5
006 B 6 1 4 2.5

I've already tried a lot a ideas, but neither of them worked, I'm a little lost here, so if anyone could help me it would be great!

Upvotes: 2

Views: 1519

Answers (1)

Henry Ecker
Henry Ecker

Reputation: 35676

Can use groupby transform to calculate the mean on the desired columns then join back to the initial DataFrame to add the newly created columns:

df = df.join(
    df.groupby('area')[['prod_a', 'prod_b']]
        .transform('mean')  # Calculate the mean for each group
        .rename(columns='mean {} for the area'.format)  # Rename columns 
)

df:

entity area prod_a prod_b mean prod_a for the area mean prod_b for the area
001 A 1 3 1.5 4.5
002 B 2 4 4 4.5
003 A 2 6 1.5 4.5
004 C 7 2 5.5 5
005 C 4 8 5.5 5
006 B 6 5 4 4.5

Setup:

import pandas as pd

df = pd.DataFrame({
    'entity': ['001', '002', '003', '004', '005', '006'],
    'area': ['A', 'B', 'A', 'C', 'C', 'B'], 
    'prod_a': [1, 2, 2, 7, 4, 6],
    'prod_b': [3, 4, 6, 2, 8, 5]
})

Upvotes: 4

Related Questions