mean and stddev of non-zero columns of dataframe

Question

I have a dataframe with several columns, with each column having some positive, negative and zero values. For each column, I want to calculate x+y, where x and y are mean and standard deviation of absolute non-zero values of each column. How to do this in python?

EdChum · Accepted Answer

You can filter the df using a boolean condition and then iterate over the cols and call describe and access the mean and std columns:

In [103]:

df = pd.DataFrame({'a':np.random.randn(10), 'b':np.random.randn(10), 'c':np.random.randn(10)})
df
Out[103]:
          a         b         c
0  0.566926 -1.103313 -0.834149
1 -0.183890 -0.222727 -0.915141
2  0.340611 -0.278525 -0.992135
3  0.380519 -1.546856  0.801598
4 -0.596142  0.494078 -0.423959
5 -0.064408  0.475466  0.220138
6 -0.549479  1.453362  2.696673
7  1.279865  0.796222  0.391247
8  0.778623  1.033530  1.264428
9 -1.669838 -1.117719  0.761952
In [111]:

for col in df[df>0]:
    print('col:', col, df[col].describe()[['mean','std']])
col: a mean    0.028279
std     0.836804
Name: a, dtype: float64
col: b mean   -0.001648
std     1.014950
Name: b, dtype: float64
col: c mean    0.297065
std     1.159999
Name: c, dtype: float64

mean and stddev of non-zero columns of dataframe

Answers (2)

Related Questions