Reputation: 23
So, basically I have a df that has in one column the different areas for each instance and in another two columns the amount they produce.
Something like this:
entity | area | prod_a | prod_b |
---|---|---|---|
001 | A | 1 | 3 |
002 | B | 2 | 4 |
003 | A | 2 | 6 |
004 | C | 7 | 2 |
005 | C | 4 | 8 |
006 | B | 6 | 5 |
I want to add two new columns with the total mean for each king of production (prod_a
and prod_b
) discriminated by area.
For example:
entity | area | prod_a | prod_b | mean prod_a for the area | mean prod_b for the area |
---|---|---|---|---|---|
001 | A | 1 | 3 | 1.5 | 4.5 |
002 | B | 2 | 4 | 4 | 2.5 |
003 | A | 2 | 6 | 1.5 | 4.5 |
004 | C | 7 | 2 | 5.5 | 5 |
005 | C | 4 | 8 | 5.5 | 5 |
006 | B | 6 | 1 | 4 | 2.5 |
I've already tried a lot a ideas, but neither of them worked, I'm a little lost here, so if anyone could help me it would be great!
Upvotes: 2
Views: 1519
Reputation: 35676
Can use groupby transform
to calculate the mean
on the desired columns then join
back to the initial DataFrame to add the newly created columns:
df = df.join(
df.groupby('area')[['prod_a', 'prod_b']]
.transform('mean') # Calculate the mean for each group
.rename(columns='mean {} for the area'.format) # Rename columns
)
df
:
entity | area | prod_a | prod_b | mean prod_a for the area | mean prod_b for the area |
---|---|---|---|---|---|
001 | A | 1 | 3 | 1.5 | 4.5 |
002 | B | 2 | 4 | 4 | 4.5 |
003 | A | 2 | 6 | 1.5 | 4.5 |
004 | C | 7 | 2 | 5.5 | 5 |
005 | C | 4 | 8 | 5.5 | 5 |
006 | B | 6 | 5 | 4 | 4.5 |
Setup:
import pandas as pd
df = pd.DataFrame({
'entity': ['001', '002', '003', '004', '005', '006'],
'area': ['A', 'B', 'A', 'C', 'C', 'B'],
'prod_a': [1, 2, 2, 7, 4, 6],
'prod_b': [3, 4, 6, 2, 8, 5]
})
Upvotes: 4