Reputation: 567
I want to calculate standard deviation of a dataframe, and merge it, something like this
std = all_data.groupby(['Id'])[features].agg('std')
all_data = pd.merge(all_data, std.reset_index(), suffixes=["", "_std"], how='left', on=['Id'])
but there is nothing such thing as .agg('std')
Upvotes: 1
Views: 1486
Reputation: 863481
Your solution working nice for me.
I think you need transform
for avoid use merge
for new Series
with same size like original DataFrame
:
all_data = pd.DataFrame({
'A':list('abcdef'),
'B':[4,5,4,5,5,4],
'C':[7,8,9,4,2,3],
'D':[1,3,5,7,1,0],
'E':[5,3,6,9,2,4],
'Id':list('aaabbb')
})
#print (all_data)
features = ['B','C','D']
#new columns names
cols = ['{}_std'.format(x) for x in features]
#python 3.6+ solution with f-strings
#cols = [f'{x}_std' for x in features]
all_data[cols] = all_data.groupby(['Id'])[features].transform('std')
print (all_data)
A B C D E Id B_std C_std D_std
0 a 4 7 1 5 a 0.57735 1 2.000000
1 b 5 8 3 3 a 0.57735 1 2.000000
2 c 4 9 5 6 a 0.57735 1 2.000000
3 d 5 4 7 9 b 0.57735 1 3.785939
4 e 5 2 1 2 b 0.57735 1 3.785939
5 f 4 3 0 4 b 0.57735 1 3.785939
Upvotes: 3