Reputation: 81
How can I add the means of b and c to my dataframe? I tried a merge but it didn't seem to work. So I want two extra columns b_mean and c_mean added to my dataframe with the results of df.groupBy('date').mean()
DataFrame
a b c date
0 2 3 5 1
1 5 9 1 1
2 3 7 1 1
I have the following code
import pandas as pd
a = [{'date': 1,'a':2, 'b':3, 'c':5}, {'date':1, 'a':5, 'b':9, 'c':1}, {'date':1, 'a':3, 'b':7, 'c':1}]
df = pd.DataFrame(a)
x = df.groupby('date').mean()
Edit:
Desired output would be the following
df.groupby('date').mean()
returns:
a b c
date
1 3.333333 6.333333 2.333333
My desired result would be the following data frame
a b c date a_mean b_mean
0 2 3 5 1 3.3333 6.3333
1 5 9 1 1 3.3333 6.3333
2 3 7 1 1 3.3333 6.3333
Upvotes: 8
Views: 2605
Reputation: 2155
I assuming that you need mean value of a column added as a new column value in the dataframe. Please correct me otherwise.
You can achieve by taking the mean of column directly and create a new column by assigning like
In [1]: import pandas as pd
In [2]: a = [{'date': 1,'a':2, 'b':3, 'c':5}, {'date':1, 'a':5, 'b':9, 'c':1}, {'date':1, 'a':3, 'b':7, 'c':1}]
In [3]: df = pd.DataFrame(a)
In [4]: for col in ['b','c']:
...: df[col+"_mean"] = df.groupby('date')[col].transform('mean')
In [5]: df
Out[5]:
a b c date b_mean c_mean
0 2 3 5 1 6.333333 2.333333
1 5 9 1 1 6.333333 2.333333
2 3 7 1 1 6.333333 2.333333
Upvotes: 3
Reputation: 294308
solution
Use join
with a rsuffix
parameter.
df.join(df.groupby('date').mean(), on='date', rsuffix='_mean')
a b c date a_mean b_mean c_mean
0 2 3 5 1 3.333333 6.333333 2.333333
1 5 9 1 1 3.333333 6.333333 2.333333
2 3 7 1 1 3.333333 6.333333 2.333333
We can limit it to just ['a', 'b']
df.join(df.groupby('date')[['a', 'b']].mean(), on='date', rsuffix='_mean')
a b c date a_mean b_mean
0 2 3 5 1 3.333333 6.333333
1 5 9 1 1 3.333333 6.333333
2 3 7 1 1 3.333333 6.333333
extra credit
Not really answering your question... but I thought it was neat!
d1 = df.set_index('date', append=True).swaplevel(0, 1)
g = df.groupby('date').describe()
d1.append(g).sort_index()
a b c
date
1 0 2.000000 3.000000 5.000000
1 5.000000 9.000000 1.000000
2 3.000000 7.000000 1.000000
25% 2.500000 5.000000 1.000000
50% 3.000000 7.000000 1.000000
75% 4.000000 8.000000 3.000000
count 3.000000 3.000000 3.000000
max 5.000000 9.000000 5.000000
mean 3.333333 6.333333 2.333333
min 2.000000 3.000000 1.000000
std 1.527525 3.055050 2.309401
Upvotes: 5
Reputation: 2544
As @ayhan mentioned, you can use pd.groupby.transform() for this. Transform is like apply, but it uses the same index as the original dataframe instead of the unique values in the column(s) grouped on.
df['a_mean'] = df.groupby('date')['a'].transform('mean')
df['b_mean'] = df.groupby('date')['b'].transform('mean')
>>> df
a b c date b_mean a_mean
0 2 3 5 1 6.333333 3.333333
1 5 9 1 1 6.333333 3.333333
2 3 7 1 1 6.333333 3.333333
Upvotes: 7