Bogdan
Bogdan

Reputation: 365

computing standard deviation in R and in Python

Does Python and R have different ways to compute the standard deviation (sd)?

For example, in python, starting with :

a = np.array([[1,2,3],  [4,5,6], [7,8,9]])

print(a.std(axis=1)) 
### per row : [0.81649658 0.81649658 0.81649658]

print(a.std(axis=0)) 
### per column : [2.44948974 2.44948974 2.44948974]

While in R :

z <- matrix(c(1,2,3,4,5,6,7,8,9), nrow=3, ncol=3, byrow=T)

# z
# [,1] [,2] [,3]
#[1,] 1 2 3
#[2,] 4 5 6
#[3,] 7 8 9

# apply(z, 1, sd)
sd(z[1,]) #1
sd(z[2,]) #1
sd(z[3,]) #1

# apply(z, 2, sd)
sd(z[,1]) #3
sd(z[,2]) #3
sd(z[,3]) #3

Upvotes: 2

Views: 802

Answers (1)

RLave
RLave

Reputation: 8364

It's because sd() from base-R uses by default n-1 in the formula for the denominator (docs).

numpy instead uses n by default, but you can use ddof (docs) to specify the n-1 correction.

a = np.array([[1,2,3],  [4,5,6], [7,8,9]])
print(a.std(axis=1, ddof = 1)) 
[1. 1. 1.]

Basically, the mean part can be calculated using 1/n or 1/n-1 .

Note that the difference between the two methods is small if n is big. In the example this is pretty evident because n=3, try with n=1000 and see the difference.

Further readings on why the n-1 correction is used.

Upvotes: 7

Related Questions