Reputation: 948
As an exercise, I try to imitate the F distribution of the ratio of two independent variances with the Monte Carlo method in R. But my result occurs significantly larger than it should be. Why?
emulateF <- function (numberOfEmulations, sampleSize1, sampleSize2){
ratioVec <- NULL
for (i in 1:numberOfEmulations) {
sample1 <- rnorm(sampleSize1, mean = 0, sd = 9)
sample2 <- rnorm(sampleSize2, mean = 0, sd = 9)
ratio <- var (sample1) / var (sample2)
if (ratio >= 1) {
ratioVec <- c(ratioVec, ratio)
} else {
ratioVec <- c(ratioVec, 1/ratio)
}
}
return (quantile (ratioVec, 0.975))
}
I supposed that the result of this function execution emulateF (10000, 30, 30)
should be very similar to qf(0.975,29,29)
. But each time it is about 10% higher. Why?
> qf(0.975,29,29)
[1] 2.100996
and
> for (i in 1:10) {
+ resultsVec <- c (resultsVec, emulateF (10000, 30, 30))
+ }
> resultsVec
97.5% 97.5% 97.5% 97.5% 97.5% 97.5% 97.5% 97.5%
2.311599 2.374442 2.377750 2.330585 2.300294 2.359123 2.344875 2.340269
97.5% 97.5%
2.307880 2.350104
>
If I change the sd = 9
to standard sd = 1
, the problem remains.
Upvotes: 0
Views: 146
Reputation: 6073
The fix to your code is to remove the if
statement. Your if
statement is forcing every stored value to be greater than 1. That shouldn't be.
FWIW, here's similar code that uses apply
instead of a for
loop.
myF <- function(n, n1, n2) {
samp1 <- matrix(rnorm(n1*n, mean=0, sd=9), nrow=n, ncol=n1)
samp2 <- matrix(rnorm(n2*n, mean=0, sd=9), nrow=n, ncol=n2)
f <- apply(samp1, 1, var) / apply(samp2, 1, var)
return(quantile(f, 0.975))
}
set.seed(789)
myF(1e4, 30, 30)
2.09744
qf(0.975, 29, 29)
2.100996
Upvotes: 1