Reputation: 271
I am running a power analysis using a normal LMM in R. I have seven input parameters, two of which I do not need to test for (no. of years and no. of sites). The other 5 parameters are the intercept, slope and the random effects standard deviation of the residual, intercept and slope.
Given that my response data (year is the sole explanatory variable in the model) is bound between (-1, +1), the intercept also falls in this range. However, what I am finding is that if I run, say, 1000 simulations with a given intercept and slope (which I am treating as constant over 10 years), then if the random effects intercept SD falls below a certain value, there are many simulations where the random effects intercept SD is zero. If I inflate the intercept SD then this seems to simulate correctly (please see below where I use residual Sd=0.25, intercept SD = 0.10 and slope SD = 0.05; if I increase intercept SD to 0.2, this is correctly simulated; or if I drop the residual SD to say 0.05, the variance parameters are correctly simulated).
Is this problem due to my coercion of the range to be (-1, +1)?
I include the code for my function and the processing of the simulations below, if this would help:
Function: generating the data:
normaldata <- function (J, K, beta0, beta1, sigma_resid,
sigma_beta0, sigma_beta1){
year <- rep(rep(0:J),K) # 0:J replicated K times
site <- rep (1:K, each=(J+1)) # 1:K sites, repeated J years
mu.beta0_true <- beta0
mu.beta1_true <- beta1
# random effects variance parameters:
sigma_resid_true <- sigma_resid
sigma_beta0_true <- sigma_beta0
sigma_beta1_true <- sigma_beta1
# site-level parameters:
beta0_true <<- rnorm(K, mu.beta0_true, sigma_beta0_true)
beta1_true <<- rnorm(K, mu.beta1_true, sigma_beta1_true)
# data
y <<- rnorm(n = (J+1)*K, mean = beta0_true[site] + beta1_true[site]*(year),
sd = sigma_resid_true)
# NOT SURE WHETHER TO IMPOSE THE LIMITS HERE OR LATER IN CODE:
y[y < -1] <- -1 # Absolute minimum
y[y > 1] <- 1 # Absolute maximum
return(data.frame(y, year, site))
}
Processing the simulated code:
vc1 <- as.data.frame(VarCorr(lme.power))
vc2 <- as.numeric(attributes(VarCorr(lme.power)$site)$stddev)
n.sims = 1000
sigma.resid <- rep(0, n.sims)
sigma.intercept <- rep(0, n.sims)
sigma.slope <- rep(0,n.sims)
intercept <- rep(0,n.sims)
slope <- rep(0,n.sims)
signif <- rep(0,n.sims)
for (s in 1:n.sims){
y.data <- normaldata(10,200, 0.30, ((0-0.30)/10), 0.25, 0.1, 0.05)
lme.power <- lmer(y ~ year + (1+year | site), data=y.data)
summary(lme.power)
theta.hat <- fixef(lme.power)[["year"]]
theta.se <- se.fixef(lme.power)[["year"]]
signif[s] <- ((theta.hat + 1.96*theta.se) < 0) |
((theta.hat - 1.96*theta.se) > 0) # returns TRUE or FALSE
signif[s]
betas <- fixef(lme.power)
intercept[s] <- betas[1]
slope[s] <- betas[2]
vc1 <- as.data.frame(VarCorr(lme.power))
vc2 <- as.numeric(attributes(VarCorr(lme.power)$site)$stddev)
sigma.resid[s] <- vc1[4,5]
sigma.intercept[s] <- vc2[1]
sigma.slope[s] <- vc2[2]
cat(paste(s, " ")); flush.console()
}
power <- mean (signif) # proportion of TRUE
power
summary(sigma.resid)
summary(sigma.intercept)
summary(sigma.slope)
summary(intercept)
summary(slope)
Thank you in advance for any help you can offer.
Upvotes: 4
Views: 3035
Reputation: 226087
This is really more of a statistical than a computational question, but the short answer is: you haven't made any mistakes, this is exactly as expected. This example on rpubs runs some simulations of a Normally distributed response (i.e. it corresponds exactly to the model assumed by LMM software, so the constraint you're worried about isn't an issue).
The lefthand histogram below is from simulations with 25 samples in 5 groups, equal variance (of 1) within and between groups; the righthand histogram is from simulations with 15 samples in 3 groups.
The sampling distribution of variances for null cases (i.e., no real between-group variation) is known to have a point mass or "spike" at zero; it's not surprising (although as far as I know not theoretically worked out) that the sampling distribution of the variances should also have a point mass at zero when the between-sample is non-zero but small and/or when the sample is small and/or noisy.
http://bbolker.github.io/mixedmodels-misc/glmmFAQ.html#zero-variance has more on this topic.
Upvotes: 4