Compute standard deviation for cells of several DataFrames

Question

I have measurement data in similarly structured Pandas Dataframes and need to compute a standard deviation for each individual cell, not entire rows or columns. I could do some looping, but the datasets are quite large, so this is not efficicent.

df1 = pd.DataFrame([[1,1,1],[2,2,2]])
df2 = pd.DataFrame([[0.9,0.8,0.7],[1.9,1.8,1.7]])
df3 = pd.DataFrame([[1.1,1.2,1.2],[2.1,2.2,2.2]])

The desired result would be

    0           1           2
0   0.08165     0.163299    0.244949
1   0.08165     0.163299    0.244949

Thanks!

G. Anderson · Accepted Answer

If I understand your problem correctly, you can use numpy.dstack to stack the values in a 3d array, then perform the st.dev. over the stacked axis

np.dstack((df1.values,df2.values,df3.values)).std(axis=2)

array([[0.08164966, 0.16329932, 0.20548047],
       [0.08164966, 0.16329932, 0.20548047]])

Note that the preferrred method for pushing the df values to an array in newer versions of pandas would be to_numpy() instead of .values

np.dstack((df1.to_numpy(),df2.to_numpy(),df3.to_numpy())).std(axis=2)

which gives the same result

Compute standard deviation for cells of several DataFrames

Answers (1)

Related Questions