Is there a difference between gamma hurdle (two-part) models and zero-inflated gamma models?

Question

I have semicontinuous data (many exact zeros and continuous positive outcomes) that I am trying to model. I have largely learned about modeling data with substantial zero mass from Zuur and Ieno's Beginner's Guide to Zero-Inflated Models in R, which makes a distinction between zero-inflated gamma models and what they call "zero-altered" gamma models, which they describe as hurdle models that combine a binomial component for the zeros and a gamma component for the positive continuous outcome. I have been exploring the use of the ziGamma option in the glmmTMB package and comparing the resulting coefficients to a hurdle model that I built following the instructions in Zuur's book (pages 128-129), and they do not coincide. I'm having trouble understanding why not, as I know that the gamma distribution cannot take on the value of zero, so I suppose every zero-inflated gamma model is technically a hurdle model. Can anyone illuminate this for me? See more comments about the models below the code.

library(tidyverse)
library(boot)
library(glmmTMB)
library(parameters)

### DATA

id <- rep(1:75000)
age <- sample(18:88, 75000, replace = TRUE)
gender <- sample(0:1, 75000, replace = TRUE)
cost <- c(rep(0, 30000), rgamma(n = 37500, shape = 5000, rate = 1), 
          sample(1:1000000, 7500, replace = TRUE))
disease <- sample(0:1, 75000, replace = TRUE)
time <- sample(30:3287, 75000, replace = TRUE)

df <- data.frame(cbind(id, disease, age, gender, cost, time))

# create binary variable for non-zero costs

df <- df %>% mutate(cost_binary = ifelse(cost > 0, 1, 0))

### HURDLE MODEL (MY VERSION)

# gamma component

hurdle_gamma <- glm(cost ~ disease + gender + age + offset(log(time)), 
                    data = subset(df, cost > 0),
                    family = Gamma(link = "log"))

model_parameters(hurdle_gamma, exponentiate = T)

# binomial component

hurdle_binomial <-  glm(cost_binary ~ disease + gender + age + time, 
                        data = df, family = "binomial")

model_parameters(hurdle_binomial, exponentiate = T)

# predicted probability of use

df$prob_use <- predict(hurdle_binomial, type = "response")

# predicted mean cost for people with any cost

df_bin <- subset(df, cost_binary == 1)

df_bin$cost_gamma <- predict(hurdle_gamma, type = "response")

# combine data frames

df2 <- left_join(df, select(df_bin, c(id, cost_gamma)), by = "id")

# replace NA with 0

df2$cost_gamma <- ifelse(is.na(df2$cost_gamma), 0, df2$cost_gamma)

# calculate predicted cost for everyone

df2 <- df2 %>% mutate(cost_pred = prob_use * cost_gamma)

# mean predicted cost

mean(df2$cost_pred)

### glmmTMB with ziGamma

zigamma_model <- glmmTMB(cost ~ disease + gender + age + offset(log(time)),
                         family = ziGamma(link = "log"),
                         ziformula = ~ disease + gender + age + time,
                         data = df)

model_parameters(zigamma_model, exponentiate = T)

df <- df %>% predict(zigamma_model, new data = df, type = "response") # doesn't work
# "no applicable method for "predict" applied to an object of class "data.frame"

The coefficients from the gamma component of my hurdle model and the fixed effects components of the zigamma model are the same, but the SEs are different, which in my actual data has substantial implications for the significance of my predictor of interest. The coefficients on the zero-inflated model are different, and I also noticed that the z values in the binomial component are the negative inverse of those in my binomial model. I assume this has to do with my binomial model modeling the probability of presence (1 is a success) and glmmTMB presumably modeling the probability of absence (0 is a success)?

In sum, can anyone point out what I am doing wrong with the glmmTMB ziGamma model?

Is there a difference between gamma hurdle (two-part) models and zero-inflated gamma models?

Answers (1)

Related Questions