Joseph Coolidge
Joseph Coolidge

Reputation: 1

Prior distribution for zero-inflated Poisson MCMCglmm?

I'm quite new to Bayesian statistics and not sure how to determine the dimensions of my priors. I am using code from Ben Bolker to run quite a big MCMCglmm with 52 response variables, 7 fixed effects, and 1 random effect. The response variables are arthropod taxa abundances, which are all counts with many, many zeros.

For MRE purposes, I scaled down the test to just the first 3 response variables and first 3 fixed effects. It looks like this:

datum <- read.csv("2024_data_raw_MCMCglmm.csv")

datum$Site <- factor(datum$Site) #my random effect variable

abund_vars <- c("Lycosidae","Gnaphosidae","Pholcidae") #my response variables
abund_vars_sc <- paste(abund_vars,"s",sep="_") #scaling variables
datum2 <- datum

fmla.MMLM1 <- cbind(Lycosidae, Gnaphosidae, Pholcidae) ~
  trait:(ShrubVolume + X.Leafy + P_cover_shrubs) - 1
#traits are my fixed effects

fam.test <- rep("zipoisson", 3) #same family for each variate

prior.model.1 <- list(R = list(V=diag(3)/3, nu=0.003),
                      G = list(G1=list(V=diag(3)/3, nu = 0.003)))
##what is wrong with my prior model?? the following line does not like it!!##

t2 <- system.time(
  MMLM1.fit <- MCMCglmm(fmla.MMLM1,
                        random=~ us(trait):Site,
                        rcov=~ us(trait):units,
                        prior = prior.model.1,
                        data = datum2,
                        family = fam.test,
                        nitt= 10000, burnin= 2000, thin=10)
)

Running the last line of code should do 10000 Markov chain Monte Carlo iterations that I can then summary() to get effect sizes and p-values (among other things I can't easily interpret), but I keep getting this error message:

Error in priorformat(if (NOpriorG) { : 
  V is the wrong dimension for some prior$G/prior$R elements
Timing stopped at: 0.21 0 0.22

From my understanding, V is supposed to be diag(3) because I have 3 response variables, in this case. What am I missing?

What confuses me even more is that the code worked just fine when my family was just "poisson" with these same priors. Does zero-inflated change what the priors need to be?

Upvotes: 0

Views: 60

Answers (1)

Ben Bolker
Ben Bolker

Reputation: 226742

tl;dr it looks like the covariance matrices need to be 6x6 instead of 3x3 (i.e., the model has 2 parameters per column of the response matrix rather than 1 ...). Below I'll show how I figured this out.

run with Poisson, original priors

(Code to set up example data is at the end of this answer.) To my surprise (since the original data set is tiny) this ran fine (although of course I didn't look at MCMC diagnostics etc.):

fam1 <- rep("poisson", 3)
prior.model.1 <- list(R = list(V=diag(3)/3, nu=0.003),
                      G = list(G1=list(V=diag(3)/3, nu = 0.003)))
fit1 <- MCMCglmm(form,
                 random=~ us(trait):Site,
                 rcov=~ us(trait):units,
                 prior = prior.model.1,
                 data = datum,
                 family = fam1)

now try with zipoisson

This replicates the error described above.

fam2 <- rep("zipoisson", 3)
fit2 <- MCMCglmm(form,
                 random=~ us(trait):Site,
                 rcov=~ us(trait):units,
                 prior = prior.model.1,
                 data = datum,
                 family = fam2)

what now?

When this happens I use a combination of traceback() (see the call stack when the problem happened), options(error=recover) (drop into browser when R encounters an error), and debug(MCMCglmm:::priorformat) to step through the code until we are right before the location of the error.

Then we run some bits of code in the browser. This is the test that fails:

any(dim(prior$V) != sum(nfl)) ## TRUE

Here dim(prior$V) is (3,3) and sum(nfl) is 6. This tells us that we need 6x6 rather than 3x3 prior covariance matrices.

This works:

prior.model.2 <- list(R = list(V=diag(6)/6, nu=0.003),
                      G = list(G1=list(V=diag(6)/6, nu = 0.003)))
fit3 <- MCMCglmm(form,
                 random=~ us(trait):Site,
                 rcov=~ us(trait):units,
                 prior = prior.model.2,
                 data = datum,
                 family = fam2)

PS you don't necessarily need a zero-inflated model for data with lots of zeros — such data may also be consistent with a Poisson model that has a low mean ...

Warton, David I. 2005. “Many Zeros Does Not Mean Zero Inflation: Comparing the Goodness-of-Fit of Parametric Models to Multivariate Abundance Data.” Environmetrics 16 (3): 275–89. https://doi.org/10.1002/env.702.


set up example

set.seed(101)

ab_vars <- (replicate(3, rpois(20, lambda = 0.2))
    |> data.frame()
    |> setNames(c("L","P","G"))
)
env_vars <-  (replicate(3, rnorm(20))
    |> data.frame()
    |> setNames(c("env1","env2","env3"))
)
datum <- data.frame(Site = factor(1:20), ab_vars, env_vars)
form <- cbind(L, P, G) ~ trait:(env1 + env2 + env3) - 1

Upvotes: 3

Related Questions