How Do I calculate standard deviation of pandas dataframe in python

Question

I want to calculate standard deviation of a dataframe, and merge it, something like this

std = all_data.groupby(['Id'])[features].agg('std')
all_data = pd.merge(all_data, std.reset_index(), suffixes=["", "_std"], how='left', on=['Id'])

but there is nothing such thing as .agg('std')

jezrael · Accepted Answer

Your solution working nice for me.

I think you need transform for avoid use merge for new Series with same size like original DataFrame:

all_data = pd.DataFrame({
        'A':list('abcdef'),
         'B':[4,5,4,5,5,4],
         'C':[7,8,9,4,2,3],
         'D':[1,3,5,7,1,0],
         'E':[5,3,6,9,2,4],
         'Id':list('aaabbb')
})

#print (all_data)

features = ['B','C','D']
#new columns names
cols = ['{}_std'.format(x) for x in features]
#python 3.6+ solution with f-strings
#cols = [f'{x}_std' for x in features]

all_data[cols] = all_data.groupby(['Id'])[features].transform('std')
print (all_data)
   A  B  C  D  E Id    B_std  C_std     D_std
0  a  4  7  1  5  a  0.57735      1  2.000000
1  b  5  8  3  3  a  0.57735      1  2.000000
2  c  4  9  5  6  a  0.57735      1  2.000000
3  d  5  4  7  9  b  0.57735      1  3.785939
4  e  5  2  1  2  b  0.57735      1  3.785939
5  f  4  3  0  4  b  0.57735      1  3.785939

How Do I calculate standard deviation of pandas dataframe in python

Answers (1)

Related Questions