Reputation: 141
I have measurement data in similarly structured Pandas Dataframes and need to compute a standard deviation for each individual cell, not entire rows or columns. I could do some looping, but the datasets are quite large, so this is not efficicent.
df1 = pd.DataFrame([[1,1,1],[2,2,2]])
df2 = pd.DataFrame([[0.9,0.8,0.7],[1.9,1.8,1.7]])
df3 = pd.DataFrame([[1.1,1.2,1.2],[2.1,2.2,2.2]])
The desired result would be
0 1 2
0 0.08165 0.163299 0.244949
1 0.08165 0.163299 0.244949
Thanks!
Upvotes: 1
Views: 734
Reputation: 5955
If I understand your problem correctly, you can use numpy.dstack to stack the values in a 3d array, then perform the st.dev. over the stacked axis
np.dstack((df1.values,df2.values,df3.values)).std(axis=2)
array([[0.08164966, 0.16329932, 0.20548047],
[0.08164966, 0.16329932, 0.20548047]])
Note that the preferrred method for pushing the df values to an array in newer versions of pandas would be to_numpy()
instead of .values
np.dstack((df1.to_numpy(),df2.to_numpy(),df3.to_numpy())).std(axis=2)
which gives the same result
Upvotes: 2