Reputation: 1459

Including offset in poisson models double number of candidate models

Revised question and code (2020-07-10)

This question has undertaken several edits as the analysis has progressed. I've included the most recent information here at the top, but for those who are stuck at other steps, please see the information below.

Recap: I'm modelling the relationships between several continuous variables (A, B & C), a categorical variable (YN), and a random intercept (ID). I've included an offset (T) because the count data was collected during behavioural observations of different durations. There's also a term for zero-inflation based on the random intercept. I'm comparing all subsets of this model to determine the best-fit models.

At present, the dredge function is throwing an error when I attempt to compare all subsets.

#simulated data
df <- data.frame(
  count = sample(size = 300, c(0:6), replace = TRUE, prob = c(0.7, 0.15, 0.05, 0.04, 0.03, 0.02, 0.01)),
  A = sample(size = 300, x = 24:65, replace = TRUE),
  B = runif(n = 300, min = 7, max = 19),
  C = rbeta(n = 300, shape1 = 1, shape2 = 25)*100,
  YN = sample(size = 300, c("Y", "N"), replace = TRUE, prob = c(0.2, 0.8)),
  T = runif(n = 300, min = 10, max = 20),
  ID = sample(size = 300, letters, replace = TRUE)
)
  
#standardize explanatory variables
dfSTD <- stdize(df, 
                omit.cols = c("count", "YN", "T", "ID"), 
                center = TRUE, scale = TRUE) 

#specify full model with standardized covariates for analysis
full <- stdizeFit(glmmTMB(count ~ z.A + z.B + z.C + YN + offset(log(T)) + (1|ID),
                          zi = ~ (1|ID),
                          family = poisson(),
                          data = dfSTD,
                          na.action = "na.fail"),
                  data = dfSTD, 
                  which = "formula", 
                  evaluate = FALSE)

#compare all possible model subsets
all <- dredge(full, 
              beta = "sd", 
              fixed = "cond(offset(log(T)))")

Error in nobs.default(global.model) : no 'nobs' method is available

I'm working in R version 3.6.1.

I'm modelling the relationships between several continuous variables (A, B & C), a categorical variable (YN), and a random intercept (ID). I've included an offset (T) because the count data was collected during behavioural observations of different durations. There's also a term for zero-inflation based on the random intercept.

I'm comparing all subsets of this model to determine the best-fit models.

My code looks like the following:

full <- glmmTMB(count ~ A + B + C + YN + offset(log(T)) + (1|ID), 
                zi = ~ (1|ID),
                family = poisson(),
                data = df, 
                na.action = "na.fail")
View(dredge(full, beta = sd))

But when I look at the table, it seems like the models are duplicated, i.e., with and without the offset. The table includes 32 models, and when I leave out the offset, there are only 16 models. Plus the coefficients are very similar. (subset of this table shown)

I've tried changing my code so that the offset is always included like below, but this leads to warnings.

    full <- glmmTMB(count ~ A + B + C + YN + (1|ID), 
                    offset = log(T), 
                    zi = ~ (1|ID),
                    family = poisson(),
                    data = df, 
                    na.action = "na.fail")
Warning messages:
1: In (function (start, objective, gradient = NULL, hessian = NULL,  :
  NA/NaN function evaluation
repeated 9 times

EDIT 2020/06/16

I've updated this post with the full model summary when the offset is included as a term:

 Family: poisson  ( log )
Formula:          count ~ A + B + C + YN + offset(log(T)) +  
    (1 | ID)
Zero inflation:      ~(1 | ID)
Data: df

     AIC      BIC   logLik deviance df.resid 
   286.4    317.4   -135.2    270.4      344 

Random effects:

Conditional model:
 Groups Name        Variance Std.Dev.
 ID   (Intercept) 0.1006   0.3172  
Number of obs: 352, groups:  name, 64

Zero-inflation model:
 Groups Name        Variance  Std.Dev. 
 ID   (Intercept) 2.089e-08 0.0001445
Number of obs: 352, groups:  name, 64

Conditional model:
                Estimate Std. Error z value Pr(>|z|)    
(Intercept)    -4.582134   0.382926 -11.966  < 2e-16 ***
A              0.003517   0.170809   0.021  0.98357    
B              0.356596   0.162703   2.192  0.02840 *  
C              0.594737   0.095198   6.247 4.17e-10 ***
YN             -1.397989   0.538510  -2.596  0.00943 ** 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Zero-inflation model:
            Estimate Std. Error z value Pr(>|z|)
(Intercept)  -0.4744     0.7323  -0.648    0.517

And when the offset included as a separate parameter outside of the formula:

Family: poisson  ( log )
Formula:          count ~ A + B + C + YN + (1 | ID) #NOTE: offset isn't listed here despite being included when I specified the model above
Zero inflation:      ~(1 | ID)
Data: df
 Offset: log(duration)

     AIC      BIC   logLik deviance df.resid 
   286.4    317.4   -135.2    270.4      344 

Random effects:

Conditional model:
 Groups Name        Variance Std.Dev.
 ID   (Intercept) 0.1006   0.3172  
Number of obs: 352, groups:  name, 64

Zero-inflation model:
 Groups Name        Variance  Std.Dev. 
 ID   (Intercept) 2.089e-08 0.0001445
Number of obs: 352, groups:  name, 64

Conditional model:
                Estimate Std. Error z value Pr(>|z|)    
(Intercept)    -4.582134   0.382926 -11.966  < 2e-16 ***
A              0.003517   0.170809   0.021  0.98357    
B              0.356596   0.162703   2.192  0.02840 *  
C              0.594737   0.095198   6.247 4.17e-10 ***
YN             -1.397989   0.538510  -2.596  0.00943 ** 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Zero-inflation model:
            Estimate Std. Error z value Pr(>|z|)
(Intercept)  -0.4744     0.7323  -0.648    0.517

EDIT 2020/06/24

A revised dredge function addresses the issue with retaining the offset variable in all models.

View(dredge(full, 
            beta = "sd"), #note sd was not in quotes above!
            fixed = "cond(offset(log(time)))") #retains offset in all models

However, this has led to a new issue:

Warning message:
In dredge(full, beta = "sd", fixed = "cond(offset(log(time)))") :
  do not know how to standardize coefficients of 'glmmTMB', argument 'beta' ignored

@Kamil Barton suggests below that I need to use strike and stdizeFit to address this warning message. But as of yet, I am unable to get this code to work without another error message. Furthermore, I'm concerned that standardizing my response variables (as suggested below) is the wrong way to go, and since standardizing the time variable results in negative values, this can no longer be logged in the model (produces NA values).

Regardless, here's my attempt at implementing Kamil's suggested approach using a nearly identical approach as that within the help file:

dfSTD <- stdize(select(df, counts, A, B, C, YN, T, ID),
                center = TRUE, scale = TRUE) 

full <- stdizeFit(glmmTMB(z.counts ~ z.A + z.B + z.C + z.YN + 
                              offset(log(z.T)) + (1|ID),
                          zi = ~ (1|ID),
                          family = poisson(),
                          data = dfSTD,
                          na.action = "na.fail"),
                   which = "formula", evaluate = FALSE)

Error in stdizeFit(glmmTMB(z.counts ~ z.A + z.B + z.C + z.YN + : 
  argument "data" is missing, with no default

Any thoughts on addressing these latest errors either Kamil's suggested approach or what I've done above would be greatly appreciated.

EDIT 2020/07/10

I have been able to address the above issue with the following chunk of code:

dfSTD <- stdize(df, omit.cols = c("counts", "YN", "T", "ID"), 
                center = TRUE, scale = TRUE) 

full <- stdizeFit(glmmTMB(counts ~ z.A + z.B + z.C + YN + 
                                  offset(log(T)) + (1|ID),
                          zi = ~ (1|ID),
                          family = poisson(),
                          data = dfSTD,
                          na.action = "na.fail"),
                  data = dfSTD, which = "formula", evaluate = FALSE)

However, now when I use the dredge function, I have another error.

allEVCounts <- dredge(full, 
                      beta = "sd", 
                      fixed = "cond(offset(log(T)))")

Error in nobs.default(global.model) : no 'nobs' method is available

Any help in determining how and where to address the 'nobs' method would be appreciated.

Upvotes: 1

Answers (2)

Lena

Reputation: 41

The solution, from Kamil Bartoń, is: in stdizeFit, set evaluate=TRUE , otherwise it produces a call rather then a fitted model object, which is not what you (and dredge) would expect.

Upvotes: -1

Kamil Bartoń

Reputation: 1562

dredge treats the offset term in the same way as fixed model terms, when it is included in the formula. To keep the offset in all models, add an argument fixed = "<offset term name>", which in this case is offset(log(T)):

dredge(full, beta = "sd", fixed = "cond(offset(log(T)))")

Note that in your code beta = sd (without quotes) is treated as beta = "none" because you pass a function sd not a character string "sd" and that is not an accepted value.

The warnings about "NA/NaN function evaluation" come from glmmTMB, some sub-models may not converge with the offset. Set options(warn = 1) (immediate warning messages) and use dredge with trace = TRUE to find out which ones.

Upvotes: 2

Including offset in poisson models double number of candidate models

Answers (2)

Related Questions