Reputation: 155
I am getting a weird output when I use the tab_model()
function of the sjPlot
package in connection with the glmmTMB
function of the glmmTMB
package to fit a generalized linear mixed model with a beta-family response. The intercept and the marginal R² look very weird.
What is going on here?
df <- structure(list(date = structure(c(6L, 5L, 6L, 1L, 4L, 2L, 2L,
2L, 2L, 4L, 6L, 1L, 6L, 6L, 2L, 2L, 4L, 4L, 5L, 1L), .Label = c("2021-03-17",
"2021-04-07", "2021-04-13", "2021-04-27", "2021-05-11", "2021-05-27"
), class = "factor"), kettlehole = structure(c(4L, 6L, 6L, 4L,
7L, 2L, 6L, 5L, 3L, 5L, 1L, 1L, 1L, 1L, 4L, 4L, 5L, 4L, 3L, 5L
), .Label = c("1189", "119", "1202", "149", "172", "2484", "552"
), class = "factor"), plot = structure(c(8L, 4L, 4L, 3L, 7L,
8L, 1L, 3L, 6L, 4L, 4L, 3L, 6L, 1L, 2L, 7L, 5L, 8L, 1L, 1L), .Label = c("1",
"2", "3", "4", "5", "6", "7", "8"), class = "factor"), treatment = structure(c(2L,
2L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
1L, 2L, 1L), .Label = c("a", "b"), class = "factor"), distance = structure(c(2L,
2L, 2L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 1L, 2L, 1L, 1L,
2L, 1L, 1L), .Label = c("2", "5"), class = "factor"), soil_moisture_content = c(0.2173,
0.1028, 0.148, 0.3852, 0.1535, 0.2618, 0.2295, 0.222, 0.3145,
0.1482, 0.2442, 0.3225, 0.1715, 0.1598, 0.2358, 0.274, 0.1543,
0.144, 0.128, 0.361), yield = c(0.518, 0.434, 0.35, 0.599, 0.594,
0.73, 0.568, 0.442, 0.695, 0.73, 0.667, 0.49, 0.744, 0.56, 0.485,
0.532, 0.668, 0.511, 0.555, 0.718), weed_coverage = c(0, 0.045,
0.03, 0.002, 0.11, 0.003, 0.01, 0, 0.02, 0.002, 0, 0.008, 0,
0.002, 0, 0.006, 0, 0, 0.02, 0.002)), row.names = c(NA, -20L), class = c("tbl_df",
"tbl", "data.frame"))
library(sjPlot)
library(glmmTMB)
glmmTMB(yield ~ soil_moisture_content + weed_coverage + distance + treatment + (1/kettlehole/plot) + (1|date), family = "beta_family", data = df) -> modop
tab_model(modop)
EDIT
So here is a screenshot of the result of tab_model() that I used on my actual dataset with n=630. I think the problem is that the model is overfit, as mentioned by Ben and needs to be adjusted by eliminating unncessary predictors.
Upvotes: 2
Views: 1720
Reputation: 226597
tl;dr the weird intercept results seem to be a bug in sjPlot::tab_model
, which should be reported to the maintainers at the sjPlot issues list — it seems that tab_model
is mistakenly exponentiating the dispersion parameter when it shouldn't. However, there are other issues with your model (it's probably overfitted) which are what's messing up your marginal R^2 value.
Here is a run of some sensible simulated data that shows the problem with tab_model()
:
set.seed(101)
## rbeta() function parameterized by mean and shape
my_rbeta <- function(n, mu, shape0) {
rbeta(n, shape1 = mu*shape0, shape2 = (1-mu)*shape0)
}
n <- 100; ng <- 10
dd <- data.frame(x = rnorm(n),
f = factor(rep(1:(n/ng), ng)))
dd <- transform(dd,
y = my_rbeta(n,
mu = plogis(-1 + 2*x + rnorm(ng)[f]),
shape0 = 5))
m1 <- glmmTMB(y ~ x + (1|f), family = "beta_family", dd)
tab_model(m1)
The results of sigma(m1)
, print(m1)
, summary(m1)
all agree that the estimated dispersion parameter is 5.56 (close to its nominal value of 5), and agree with confint(m1, "disp_")
:
2.5 % 97.5 % Estimate
sigma 4.068351 7.606602 5.562942
However, tab_model()
reports:
displaying two problems:
exp(5.563) = 260.6
instead of 5.563, and confidence intervals are similarly (incorrectly) exponentiated(Intercept)
, which is confusing (it is technically the "intercept" of the dispersion model)However, the R^2 values look sensible — we'll come back to this.
What about the model itself?
A reasonable rule of thumb (see e.g. Harrell Regression Modeling Strategies) says you should generally aim to have about 1 parameter for every 10-20 observations. Between fixed and random effects, you have 9 parameters (length(modop$fit$par)
, or nobs(modop) - df.residual(modop)
) for 20 observations.
If we run diagnose(modop)
(note I am using a fixed/development version of diagnose()
, your results may differ slightly) gives:
diagnose(modop)
Unusually large coefficients (|x|>10):
theta_1|date.1
-11.77722
Large negative coefficients in zi (log-odds of zero-inflation), dispersion, or random effects (log-standard deviations) suggest unnecessary components (converging to zero on the constrained scale) ...
(if you look at summary(modop)
you'll see that the estimated standard deviation of the date
random effect is 7e-6, about 4 orders of magnitude less than the next-largest random effect term ...)
modop2 <- update(modop, . ~ . - (1|date))
diagnose(modop2)
says this model is OK.
However, tab_model(modop2)
still gives a suspicious conditional R^2 (1.038, i.e. >1). Running performance::r2_nakagawa(modop2)
directly (I believe this is the underlying machinery used by tab_model()
) gives:
# R2 for Mixed Models
Conditional R2: 1.038
Marginal R2: 0.183
but with warnings
1: mu of 1.5 is too close to zero, estimate of random effect variances may be unreliable.
2: Model's distribution-specific variance is negative. Results are not reliable.
I would basically conclude that this data set is just a little too small/model is too big to get useful R^2 values.
FWIW I'm slightly concerned that tab_model()
reports N_plot = 8
for this model: it ought to be reporting N_{plot:kettlehole} = 18
as in summary(modop2)
Upvotes: 3