My F distribution emulation with Monte Carlo method In R does not fit. Why?

Question

As an exercise, I try to imitate the F distribution of the ratio of two independent variances with the Monte Carlo method in R. But my result occurs significantly larger than it should be. Why?

emulateF <- function (numberOfEmulations, sampleSize1, sampleSize2){
ratioVec <- NULL
for (i in 1:numberOfEmulations) {
    sample1 <- rnorm(sampleSize1, mean = 0, sd = 9)
    sample2 <- rnorm(sampleSize2, mean = 0, sd = 9)
    ratio <- var (sample1) / var (sample2)
    if (ratio >= 1) {
        ratioVec <- c(ratioVec, ratio)
    } else {
        ratioVec <- c(ratioVec, 1/ratio)
    }
    }   
return (quantile (ratioVec, 0.975))
}

I supposed that the result of this function execution emulateF (10000, 30, 30) should be very similar to qf(0.975,29,29). But each time it is about 10% higher. Why?

> qf(0.975,29,29)
[1] 2.100996

and

> for (i in 1:10) {
+ resultsVec <- c (resultsVec, emulateF (10000, 30, 30))
+ }
> resultsVec
   97.5%    97.5%    97.5%    97.5%    97.5%    97.5%    97.5%    97.5% 
2.311599 2.374442 2.377750 2.330585 2.300294 2.359123 2.344875 2.340269 
   97.5%    97.5% 
2.307880 2.350104 
>

If I change the sd = 9 to standard sd = 1, the problem remains.

DanY · Accepted Answer

The fix to your code is to remove the if statement. Your if statement is forcing every stored value to be greater than 1. That shouldn't be.

FWIW, here's similar code that uses apply instead of a for loop.

myF <- function(n, n1, n2) {
    samp1 <- matrix(rnorm(n1*n, mean=0, sd=9), nrow=n, ncol=n1)
    samp2 <- matrix(rnorm(n2*n, mean=0, sd=9), nrow=n, ncol=n2)
    f <- apply(samp1, 1, var) / apply(samp2, 1, var)
    return(quantile(f, 0.975))
}

set.seed(789)
myF(1e4, 30, 30)
2.09744

qf(0.975, 29, 29)
2.100996

My F distribution emulation with Monte Carlo method In R does not fit. Why?

Answers (1)

Related Questions