Serhii Kushchenko
Serhii Kushchenko

Reputation: 948

My F distribution emulation with Monte Carlo method In R does not fit. Why?

As an exercise, I try to imitate the F distribution of the ratio of two independent variances with the Monte Carlo method in R. But my result occurs significantly larger than it should be. Why?

emulateF <- function (numberOfEmulations, sampleSize1, sampleSize2){
ratioVec <- NULL
for (i in 1:numberOfEmulations) {
    sample1 <- rnorm(sampleSize1, mean = 0, sd = 9)
    sample2 <- rnorm(sampleSize2, mean = 0, sd = 9)
    ratio <- var (sample1) / var (sample2)
    if (ratio >= 1) {
        ratioVec <- c(ratioVec, ratio)
    } else {
        ratioVec <- c(ratioVec, 1/ratio)
    }
    }   
return (quantile (ratioVec, 0.975))
}

I supposed that the result of this function execution emulateF (10000, 30, 30) should be very similar to qf(0.975,29,29). But each time it is about 10% higher. Why?

> qf(0.975,29,29)
[1] 2.100996

and

> for (i in 1:10) {
+ resultsVec <- c (resultsVec, emulateF (10000, 30, 30))
+ }
> resultsVec
   97.5%    97.5%    97.5%    97.5%    97.5%    97.5%    97.5%    97.5% 
2.311599 2.374442 2.377750 2.330585 2.300294 2.359123 2.344875 2.340269 
   97.5%    97.5% 
2.307880 2.350104 
> 

If I change the sd = 9 to standard sd = 1, the problem remains.

Upvotes: 0

Views: 146

Answers (1)

DanY
DanY

Reputation: 6073

The fix to your code is to remove the if statement. Your if statement is forcing every stored value to be greater than 1. That shouldn't be.

FWIW, here's similar code that uses apply instead of a for loop.

myF <- function(n, n1, n2) {
    samp1 <- matrix(rnorm(n1*n, mean=0, sd=9), nrow=n, ncol=n1)
    samp2 <- matrix(rnorm(n2*n, mean=0, sd=9), nrow=n, ncol=n2)
    f <- apply(samp1, 1, var) / apply(samp2, 1, var)
    return(quantile(f, 0.975))
}

set.seed(789)
myF(1e4, 30, 30)
2.09744

qf(0.975, 29, 29)
2.100996

Upvotes: 1

Related Questions