Sebastian N
Sebastian N

Reputation: 61

How do I simulate data for a power analysis of a repeated measure linear mixed effects regression using simr?

I would like to conduct a simulation-based power analysis for a linear mixed model in lmer with repeated measures from scratch. I understand that simr might be the package to go with. However, I do not understand what I have to do and am not able to make sense of anything I read online.

I want to test a random intercept model where

i.e., three measurements are nested in participants using a random intercept.

How do I simulate data so that I am able to find out what would be an appropriate sample size for an interaction effect (e.g. group:time1) of beta = .3 to be found with a power of .80? And which additional info do I need to do that?

Upvotes: 6

Views: 1567

Answers (1)

Robert Long
Robert Long

Reputation: 6867

There are packages such as simr which will do all of this, and more, for you (and will handle unbalanced designs too), but here is a simple approach to simulating data for a mixed model, which you can then use in a power analysis, from scratch:

There are several important parameters to consider:

  • the overall data size
  • whether the data are balanced or not
  • the number of participants
  • the fixed effects
  • the random effects variance
  • the residual variance

Here we use a balanced design, which makes the setup easier, but it would not be too hard to extend it to unbalanced designs:

library(lme4)
set.seed(15)

N <- 10    # number of unique values of a and b for each participant
N.participant <- 20   # number of participants
betas <- c(10, 1, 2, 3, 4, 5, 6, 7)   # fixed effects
s <- 0.5   # variance of random effects
e <- 1  # residual variance

# form the balanced data frame
dt <- expand.grid(a = rnorm(N), b = rnorm(N), c = c(0,1), d = c(0,1), time = c(0,1,2), group = c(0,1), participant = LETTERS[1:N.participant])

# form the model matrix for fixed effects
X <- model.matrix(~ a + b + c + d + time + group + time:group, data = dt)

# set a vector of 1s for the outcome. This is needed so we can easily compute the
# model matrix for random effects (in this case just the random intercetps)
dt$y <- 1

# The final model formula
myFormula <- "y ~ a + b + c + d + time + group + time:group + (1 | participant)"

# obtain the model matrix for random effects
foo <- lFormula(eval(myFormula), dt)
Z <- t(as.matrix(foo$reTrms$Zt))

# simulate the random effects
u <- rnorm(N.participant , 0, s)

# simulate y using the general mixed model equation and residual variance e
dt$y <- X %*% betas + Z %*% u + rnorm(nrow(dt), 0, e)

# fit the model
m0 <- lmer(myFormula, dt)
summary(m0)

You would run this a number of times for each set of parameters, with different seeds of course, and compute the relevant Monte Carlo estimates, and the achieved power for each set.

Upvotes: 3

Related Questions