Reputation: 1039
I am conducting a small simulation study to judge how good two normality tests really are. My plan is to generate a multitude of normal distribution samples of not too many observations and determine how often each test rejects the null hypothesis of normality.
The (incomplete) code I have so far is
library(nortest)
y<-replicate(10000,{
x<-rnorm(50)
ad.test(x)$p.value
ks.test(x,y=pnorm)$p.value
}
)
Now I would like to count the proportion of these p-values that are smaller than 0.05 for each test. Could you please tell me how I could do that? I apologise if this is a newbie question, but I myself am new to R.
Thank you.
Upvotes: 3
Views: 309
Reputation: 226097
library(nortest)
nsim <- 10000
nx <- 50
set.seed(101)
y <- replicate(nsim,{
x<-rnorm(nx)
c(ad=ad.test(x)$p.value,
ks=ks.test(x,y=pnorm)$p.value)
}
)
apply(y<0.05,MARGIN=1,mean)
## ad ks
## 0.0534 0.0480
Using MARGIN=1
tells apply
to take the mean across rows, rather than columns -- this is sensible given the ordering that replicate()
's built-in simplification produces.
For examples of this type, the type I error rates of any standard tests will be extremely close to their nominal value (0.05 in this example).
Upvotes: 2
Reputation: 693
Your code isn't outputting the p-values. You could do something like this:
rep_test <- function(reps=10000) {
p_ks <- rep(NA, reps)
p_ad <- rep(NA, reps)
for (i in 1:reps) {
x <- rnorm(50)
p_ks[i] <- ks.test(x, y=pnorm)$p.value
p_ad[i] <- ad.test(x)$p.value
}
return(data.frame(cbind(p_ks, p_ad)))
}
sum(test$p_ks<.05)/10000
sum(test$p_ad<.05)/10000
Upvotes: 1
Reputation: 453
If you run each test separately, then you can simply count which vals are stored in y that are less than 0.05.
y<-replicate(1000,{
x<-rnorm(50)
ks.test(x,y=pnorm)$p.value})
length(which(y<0.05))
Upvotes: 2