Difference in Variance computation

Question

I have Manually computed the variance of two data sets using definitional, computational and normal R expressions.

 set.seed(12345)                        
 n <- 1e7                             
 df <- tibble(
   small = rnorm(n, mean=100, sd=1),
   large = rnorm(n, mean=1e8, sd=1)
 )

#Definitional
varFuncd <- function(x) {
  x <- as.numeric(as.character(x))[!is.na(as.numeric(as.character(x)))] 
  sum((x-mean(x))^2) / (length(x)-1)
}

#Computational 
varFuncc <- function(x){
  x <- as.numeric(as.character(x))[!is.na(as.numeric(as.character(x)))]
  (sum(x^2) - (sum(x)^2)/length(x))/(length(x)-1)
}

but the variance of the Large column produces an expected large result (1.6). Please what might be the reason?

My response is:

All the definitional expressions produced an expected variance of 1. However, the computation expression for "Large" produced a higher variance. Definitional expression produces a square of the difference - which translates to squaring relatively small values and produce more efficient results. While Computational uses difference of squares, when the underlying values are large, the difference of squares produces a less efficient result, because squaring large numbers produces super large figure s that become inefficient when divided by n-1.

Difference in Variance computation

Answers (1)

Related Questions