Theo
Theo

Reputation: 3

Simulate data using glmmTMB simulate_new()

I want to do power analysis based on preliminary experiments to inform experimental design choices of future experiments.

I am using glmmTMB to fit a negative binomial model to my data, using fixed predictors + random effects. I previously used simr to perform poser analysis of models fitted using glmer, but according to this issue in simr GitHub, simr does not support glmmTMB models.

I am thus trying to find a workaround solution based on glmmTMB built-in functions. I am planning on doing the following:

  1. Fit a model using my preliminary data
  2. Use the estimated coefficients to simulate new data using simulate_new()
  3. Fit the model to the new simulated data to assess significance
  4. Repeat steps 2. and 3. to compute power.

To this end, I am trying to get what the different values in the newparams argument of simulate_new() represent. In the simulate_new() help page is the following example:

sim_count <- simulate_new(~ mined + (1|site),
             newdata = Salamanders,
             zi = ~ mined,
             family = nbinom2,
             newparams = list(beta = c(2, 1),
                         betazi = c(-0.5, 0.5), ## logit-linear model for zi
                         betadisp = log(2), ## log(NB dispersion)
                         theta = log(1)) ## log(among-site SD)
)

However, when fitting the corresponding model (summary(glmmTMB(count ~ mined + (1|site), data=Salamanders, ziformula=~mined, family=nbinom2))), I get quite different coefficients:

 Family: nbinom2  ( log )
Formula:          count ~ mined + (1 | site)
Zero inflation:         ~mined
Data: Salamanders

     AIC      BIC   logLik deviance df.resid 
  1741.4   1768.2   -864.7   1729.4      638 

Random effects:

Conditional model:
 Groups Name        Variance Std.Dev.
 site   (Intercept) 0.1338   0.3658  
Number of obs: 644, groups:  site, 23

Dispersion parameter for nbinom2 family (): 0.965 

Conditional model:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  -0.5573     0.3711  -1.501    0.133    
minedno       1.4674     0.3616   4.059 4.94e-05 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Zero-inflation model:
            Estimate Std. Error z value Pr(>|z|)   
(Intercept)   0.2404     0.4863   0.494  0.62103   
minedno      -2.2561     0.7829  -2.882  0.00396 **
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

It thus seems that the help page from simulate_new() uses arbitrary values as an example, not values corresponding to coefficients that represent the data? If I wanted to correctly simulate new data, would the following be correct?

sim_count <- simulate_new(~ mined + (1|site),
             newdata = Salamanders,
             zi = ~ mined,
             family = nbinom2,
             newparams = list(beta = c(-0.56, 1.47),
                         betazi = c(0.24, -2.26), ## logit-linear model for zi
                         betadisp = log(0.97), ## log(NB dispersion)
                         theta = log(0.37)) ## log(among-site SD)

Thank you in advance for your help and insights, Cheers, Theo

Upvotes: 0

Views: 19

Answers (1)

Ben Bolker
Ben Bolker

Reputation: 226742

(Copied from answer on the glmmTMB issues list)

You're correct. Of course, if you want to simulate from an already fitted model, you just need simulate() — the use case of simulate_new() is when you want to simulate using different parameters than those from a fitted model, or where you are in a pure-simulation scenario where you don't have any real data to fit a model to in the first place ...

If we fit a model to the simulated response, we get answers that are reasonably close to the values we put in (to be more thorough, we could do this many times and show that the distribution of fitted values is centered on the true values [i.e., there is not much bias]).

set.seed(101)
sim_count <- simulate_new(~ mined + (1|site),
             newdata = Salamanders,
             zi = ~ mined,
             family = nbinom2,
             newparams = list(beta = c(2, 1),
                         betazi = c(-0.5, 0.5), ## logit-linear model for zi
                         betadisp = log(2), ## log(NB dispersion)
                         theta = log(1)) ## log(among-site SD)
)
Salamanders$sim_count <- sim_count[[1]]
model <- glmmTMB(sim_count ~ mined + (1|site), 
      data=Salamanders, ziformula=~mined, family=nbinom2)
coef(summary(model))
$cond
            Estimate Std. Error  z value     Pr(>|z|)
(Intercept) 1.648139  0.2572210 6.407481 1.479438e-10
minedno     1.386065  0.3549957 3.904455 9.443779e-05

$zi
              Estimate Std. Error   z value     Pr(>|z|)
(Intercept) -0.7163624  0.1509574 -4.745461 2.080317e-06
minedno      0.6331215  0.1863901  3.396755 6.818988e-04

$disp
NULL

Upvotes: 1

Related Questions