Reputation: 647
df1:
A B
0 2002-01-13 3.9
1 2002-01-13 1.9
2 2002-01-14 8.0
3 2002-01-14 9.0
I want to create a new column df1["C"]
with means of B
values, per each A
group.
Output should be:
A B C
0 2002-01-13 3.9 2.9
1 2002-01-13 1.9 2.9
2 2002-01-14 8.0 8.5
3 2002-01-14 9.0 8.5
And now I want to assign C
values to each A
group, to another df2
.
df2:
A D
0 2002-01-13 Joseph
1 2002-01-13 Emma
2 2002-01-13 Michael
3 2002-01-14 Anna
4 2002-01-14 Yvonne
5 2002-01-14 Anthony
Output should be:
A D E
0 2002-01-13 Joseph 2.9
1 2002-01-13 Emma 2.9
2 2002-01-13 Michael 2.9
3 2002-01-14 Anna 8.5
4 2002-01-14 Yvonne 8.5
5 2002-01-14 Anthony 8.5
I´ve tried:
df1["C"] = df1.groupby("A")["B"].mean()
Upvotes: 3
Views: 75
Reputation: 28303
Posting since others did not mention using pd.merge
or DataFrame.join
.
If only the final output is required:
pd.merge(df2, df1.groupby('A', as_index=False).B.agg('mean').rename(columns={'B':'E'}), on='A')
#outputs:
A D E
0 2002-01-13 Joseph 2.9
1 2002-01-13 Emma 2.9
2 2002-01-13 Michael 2.9
3 2002-01-14 Anna 8.5
4 2002-01-14 Yvonne 8.5
5 2002-01-14 Anthony 8.5
I have a hunch that the join based solution will be faster than the map based solutions given large data frames.
Upvotes: 0
Reputation: 164773
Part 1
df['C'] = df.groupby('A')['B'].transform('mean')
The reason your code does not work is your groupby
object returns a series indexed by A
.
Parts 1 & 2
You could perform your transformations by mapping A
to a precalculated groupby
object in each dataframe.
s = df1.groupby('A')['B'].mean()
df['C'] = df['A'].map(s)
df2['E'] = df2['A'].map(s)
Upvotes: 0
Reputation: 334
You can use
df['C'] = df['A'].replace(df.groupby('A')['B'].mean().to_dict())
Upvotes: 0
Reputation: 38415
You don't have to add a column to df1, you can directly map the values from the groupby df1 to df2.
df2['E'] = df2['A'].map(df1.groupby('A').B.mean())
A D E
0 2002-01-13 Joseph 2.9
1 2002-01-13 Emma 2.9
2 2002-01-13 Michael 2.9
3 2002-01-14 Anna 8.5
4 2002-01-14 Yvonne 8.5
5 2002-01-14 Anthony 8.5
Upvotes: 3
Reputation: 323316
First question transform
df1['C'] = df1.groupby('A').B.transform('mean')
Second using map
(Notice I am using the df1 directly cause I adding drop_duplicates
)
df2['E']=df2.A.map(df1.drop_duplicates('A').set_index('A').C)
Upvotes: 2