Add a column with mean values for groups based on another column

Question

So, basically I have a df that has in one column the different areas for each instance and in another two columns the amount they produce.

Something like this:

entity	area	prod_a	prod_b
001	A	1	3
002	B	2	4
003	A	2	6
004	C	7	2
005	C	4	8
006	B	6	5

I want to add two new columns with the total mean for each king of production (prod_a and prod_b) discriminated by area.

For example:

entity	area	prod_a	prod_b	mean prod_a for the area	mean prod_b for the area
001	A	1	3	1.5	4.5
002	B	2	4	4	2.5
003	A	2	6	1.5	4.5
004	C	7	2	5.5	5
005	C	4	8	5.5	5
006	B	6	1	4	2.5

I've already tried a lot a ideas, but neither of them worked, I'm a little lost here, so if anyone could help me it would be great!

Henry Ecker · Accepted Answer

Can use groupby transform to calculate the mean on the desired columns then join back to the initial DataFrame to add the newly created columns:

df = df.join(
    df.groupby('area')[['prod_a', 'prod_b']]
        .transform('mean')  # Calculate the mean for each group
        .rename(columns='mean {} for the area'.format)  # Rename columns 
)

df:

entity	area	prod_a	prod_b	mean prod_a for the area	mean prod_b for the area
001	A	1	3	1.5	4.5
002	B	2	4	4	4.5
003	A	2	6	1.5	4.5
004	C	7	2	5.5	5
005	C	4	8	5.5	5
006	B	6	5	4	4.5

Setup:

import pandas as pd

df = pd.DataFrame({
    'entity': ['001', '002', '003', '004', '005', '006'],
    'area': ['A', 'B', 'A', 'C', 'C', 'B'], 
    'prod_a': [1, 2, 2, 7, 4, 6],
    'prod_b': [3, 4, 6, 2, 8, 5]
})

Add a column with mean values for groups based on another column

Answers (1)

Related Questions

entity	area	prod_a	prod_b	mean prod_a for the area	mean prod_b for the area
001	A	1	3	1.5	4.5
002	B	2	4	4	2.5
003	A	2	6	1.5	4.5
004	C	7	2	5.5	5
005	C	4	8	5.5	5
006	B	6	1	4	2.5

entity	area	prod_a	prod_b	mean prod_a for the area	mean prod_b for the area
001	A	1	3	1.5	4.5
002	B	2	4	4	4.5
003	A	2	6	1.5	4.5
004	C	7	2	5.5	5
005	C	4	8	5.5	5
006	B	6	5	4	4.5

entity	area	prod_a	prod_b	mean prod_a for the area	mean prod_b for the area
001	A	1	3	1.5	4.5
002	B	2	4	4	2.5
003	A	2	6	1.5	4.5
004	C	7	2	5.5	5
005	C	4	8	5.5	5
006	B	6	1	4	2.5

entity	area	prod_a	prod_b	mean prod_a for the area	mean prod_b for the area
001	A	1	3	1.5	4.5
002	B	2	4	4	4.5
003	A	2	6	1.5	4.5
004	C	7	2	5.5	5
005	C	4	8	5.5	5
006	B	6	5	4	4.5

entity	area	prod_a	prod_b	mean prod_a for the area	mean prod_b for the area
001	A	1	3	1.5	4.5
002	B	2	4	4	2.5
003	A	2	6	1.5	4.5
004	C	7	2	5.5	5
005	C	4	8	5.5	5
006	B	6	1	4	2.5

entity	area	prod_a	prod_b	mean prod_a for the area	mean prod_b for the area
001	A	1	3	1.5	4.5
002	B	2	4	4	4.5
003	A	2	6	1.5	4.5
004	C	7	2	5.5	5
005	C	4	8	5.5	5
006	B	6	5	4	4.5