Reputation: 365
Does Python and R have different ways to compute the standard deviation (sd)?
For example, in python, starting with :
a = np.array([[1,2,3], [4,5,6], [7,8,9]])
print(a.std(axis=1))
### per row : [0.81649658 0.81649658 0.81649658]
print(a.std(axis=0))
### per column : [2.44948974 2.44948974 2.44948974]
While in R :
z <- matrix(c(1,2,3,4,5,6,7,8,9), nrow=3, ncol=3, byrow=T)
# z
# [,1] [,2] [,3]
#[1,] 1 2 3
#[2,] 4 5 6
#[3,] 7 8 9
# apply(z, 1, sd)
sd(z[1,]) #1
sd(z[2,]) #1
sd(z[3,]) #1
# apply(z, 2, sd)
sd(z[,1]) #3
sd(z[,2]) #3
sd(z[,3]) #3
Upvotes: 2
Views: 802
Reputation: 8364
It's because sd()
from base-R
uses by default n-1
in the formula for the denominator (docs).
numpy
instead uses n
by default, but you can use ddof
(docs) to specify the n-1
correction.
a = np.array([[1,2,3], [4,5,6], [7,8,9]])
print(a.std(axis=1, ddof = 1))
[1. 1. 1.]
Basically, the mean
part can be calculated using or .
Note that the difference between the two methods is small if n
is big. In the example this is pretty evident because n=3
, try with n=1000
and see the difference.
Further readings on why the n-1
correction is used.
Upvotes: 7