Weird output of tab_model() with glmmTMB

Question

I am getting a weird output when I use the tab_model() function of the sjPlot package in connection with the glmmTMB function of the glmmTMB package to fit a generalized linear mixed model with a beta-family response. The intercept and the marginal R² look very weird.

What is going on here?

df <- structure(list(date = structure(c(6L, 5L, 6L, 1L, 4L, 2L, 2L, 
2L, 2L, 4L, 6L, 1L, 6L, 6L, 2L, 2L, 4L, 4L, 5L, 1L), .Label = c("2021-03-17", 
"2021-04-07", "2021-04-13", "2021-04-27", "2021-05-11", "2021-05-27"
), class = "factor"), kettlehole = structure(c(4L, 6L, 6L, 4L, 
7L, 2L, 6L, 5L, 3L, 5L, 1L, 1L, 1L, 1L, 4L, 4L, 5L, 4L, 3L, 5L
), .Label = c("1189", "119", "1202", "149", "172", "2484", "552"
), class = "factor"), plot = structure(c(8L, 4L, 4L, 3L, 7L, 
8L, 1L, 3L, 6L, 4L, 4L, 3L, 6L, 1L, 2L, 7L, 5L, 8L, 1L, 1L), .Label = c("1", 
"2", "3", "4", "5", "6", "7", "8"), class = "factor"), treatment = structure(c(2L, 
2L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 
1L, 2L, 1L), .Label = c("a", "b"), class = "factor"), distance = structure(c(2L, 
2L, 2L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 1L, 2L, 1L, 1L, 
2L, 1L, 1L), .Label = c("2", "5"), class = "factor"), soil_moisture_content = c(0.2173, 
0.1028, 0.148, 0.3852, 0.1535, 0.2618, 0.2295, 0.222, 0.3145, 
0.1482, 0.2442, 0.3225, 0.1715, 0.1598, 0.2358, 0.274, 0.1543, 
0.144, 0.128, 0.361), yield = c(0.518, 0.434, 0.35, 0.599, 0.594, 
0.73, 0.568, 0.442, 0.695, 0.73, 0.667, 0.49, 0.744, 0.56, 0.485, 
0.532, 0.668, 0.511, 0.555, 0.718), weed_coverage = c(0, 0.045, 
0.03, 0.002, 0.11, 0.003, 0.01, 0, 0.02, 0.002, 0, 0.008, 0, 
0.002, 0, 0.006, 0, 0, 0.02, 0.002)), row.names = c(NA, -20L), class = c("tbl_df", 
"tbl", "data.frame"))

library(sjPlot)
library(glmmTMB)

glmmTMB(yield ~ soil_moisture_content + weed_coverage + distance + treatment + (1/kettlehole/plot) + (1|date), family = "beta_family", data = df) -> modop

tab_model(modop)

EDIT

So here is a screenshot of the result of tab_model() that I used on my actual dataset with n=630. I think the problem is that the model is overfit, as mentioned by Ben and needs to be adjusted by eliminating unncessary predictors.

Ben Bolker · Accepted Answer

tl;dr the weird intercept results seem to be a bug in sjPlot::tab_model, which should be reported to the maintainers at the sjPlot issues list — it seems that tab_model is mistakenly exponentiating the dispersion parameter when it shouldn't. However, there are other issues with your model (it's probably overfitted) which are what's messing up your marginal R^2 value.

Here is a run of some sensible simulated data that shows the problem with tab_model():

set.seed(101)
## rbeta() function parameterized by mean and shape
my_rbeta <- function(n, mu, shape0) {
  rbeta(n, shape1 = mu*shape0, shape2 = (1-mu)*shape0)
}
n <- 100; ng <- 10
dd <- data.frame(x = rnorm(n),
                 f = factor(rep(1:(n/ng), ng)))
dd <- transform(dd,
                y = my_rbeta(n,
                             mu = plogis(-1 + 2*x + rnorm(ng)[f]),
                             shape0 = 5))

m1 <- glmmTMB(y ~ x + (1|f), family = "beta_family", dd)
tab_model(m1)

The results of sigma(m1), print(m1), summary(m1) all agree that the estimated dispersion parameter is 5.56 (close to its nominal value of 5), and agree with confint(m1, "disp_"):

         2.5 %   97.5 % Estimate
sigma 4.068351 7.606602 5.562942

However, tab_model() reports:

displaying two problems:

(major) the dispersion is reported as exp(5.563) = 260.6 instead of 5.563, and confidence intervals are similarly (incorrectly) exponentiated
(minor) the dispersion parameter is labeled as (Intercept), which is confusing (it is technically the "intercept" of the dispersion model)

However, the R^2 values look sensible — we'll come back to this.

What about the model itself?

A reasonable rule of thumb (see e.g. Harrell Regression Modeling Strategies) says you should generally aim to have about 1 parameter for every 10-20 observations. Between fixed and random effects, you have 9 parameters (length(modop$fit$par), or nobs(modop) - df.residual(modop)) for 20 observations.

If we run diagnose(modop) (note I am using a fixed/development version of diagnose(), your results may differ slightly) gives:

diagnose(modop)
Unusually large coefficients (|x|>10):

theta_1|date.1 
     -11.77722

Large negative coefficients in zi (log-odds of zero-inflation), dispersion, or random effects (log-standard deviations) suggest unnecessary components (converging to zero on the constrained scale) ...

(if you look at summary(modop) you'll see that the estimated standard deviation of the date random effect is 7e-6, about 4 orders of magnitude less than the next-largest random effect term ...)

modop2 <- update(modop, . ~ . - (1|date))

diagnose(modop2) says this model is OK.

However, tab_model(modop2) still gives a suspicious conditional R^2 (1.038, i.e. >1). Running performance::r2_nakagawa(modop2) directly (I believe this is the underlying machinery used by tab_model()) gives:

# R2 for Mixed Models
  Conditional R2: 1.038
     Marginal R2: 0.183

but with warnings

1: mu of 1.5 is too close to zero, estimate of random effect variances may be unreliable.
2: Model's distribution-specific variance is negative. Results are not reliable.

I would basically conclude that this data set is just a little too small/model is too big to get useful R^2 values.

FWIW I'm slightly concerned that tab_model() reports N_plot = 8 for this model: it ought to be reporting N_{plot:kettlehole} = 18 as in summary(modop2)

Weird output of tab_model() with glmmTMB

Answers (1)

Related Questions