convex895
convex895

Reputation: 25

Poisson Based Regression models code in R runs very slow

I am working on a count data and, trying several different Poisson Fixed Effects Regression Models by using zeroinfl (from pscl package) and pglm (from pglm package) for not zero inflated models. However, my R code runs very slow and it takes more than 9-10 hours. For clarification, I am adding fixed effects manually by adding time and ID dummies.

model<- zeroinfl(y~ x1+ x2+ x3+ x4 + as.factor(time) 
               + as.factor(ID) | 1, data = df, dist = "poisson")

I am aware of that question: R Zeroinfl model. However, my data is highly zero inflated with mean 0.587 and median equals to 0 and I am afraid this feature of the data can be lost by suggested methods. I am kind of new to R. Any help is appreciated.

Upvotes: 0

Views: 432

Answers (1)

Ben Bolker
Ben Bolker

Reputation: 227061

Given what you've said so far, it may be worth trying

library(glmmTMB)
model <- glmmTMB(y~ x1+ x2+ x3+ x4 + as.factor(time) 
               + as.factor(ID),
          dispformula = ~ 1, 
          data = df, 
          family = "poisson",
          sparseX = c(cond = TRUE))

You can do whatever you like with the zero-inflation component (e.g. dispformula = ~ x1 + x2 + x3 + x4 to include those covariates). If you want the zero-inflated model matrix to be sparse as well, add zi = TRUE to the sparseX vector.

The reason (particularly for the sparseX) is that generating the model matrix for a data set with 87K rows and 2500 IDs with zeroinfl will (I think) create a model matrix that is approximately 2500*87e3*8/2^30 = 1.620501 gigabytes ...

Upvotes: 3

Related Questions