Reputation: 433
I would like to model / fit Value on explanatory variables Type and Material (Value ~ Material + Type). Having a look at the sample test data provided here, one could see that Material X has all zero Values except for one, which makes the distribution of Value zero-inflated, across all observations. Given the model diagnostics, linear assumptions do not hold here.
Value is a numeric variable, and all observations are independent from each other.
I would like to know how can I find a proper distribution for this data, or transform it in a way that I could handle these zeros.
I read about gamlss
and pscl
packages, but I struggled applying them to my data.
ID <- seq(from = 1, to = 36)
Type <- rep(c("A", "B"),each=18)
Material <- rep (c("X","Y","Z","X","Y","Z"), each = 6)
Value <- c(0,0,0,2,0,0,27,50,30,103,104,223,147,
127,115,78,148,297,0,0,0,0,0,0,84,
59,56,53,64,86,90,75,95,111,215,191)
test.data <- data.frame(ID,Type,Material,Value)
test.data$ID <- factor(test.data$ID)
test.data$Type <- factor(test.data$Type)
test.data$Material <- factor(test.data$Material)
Upvotes: 0
Views: 713
Reputation: 186
You could try:
m1 <- gamlss(Value ~ Material + Type, sigma.fo =~ Material + Type,
family=ZIP)
ZIP(mu, sigma) is a zero inflated Poisson distribution, which is a mixture of zero with probability sigma, and a Poisson distribution PO(mu) with probability (1-sigma).
You could then look at the residuals using plot(m1) or wp(m1)
The model may not be adequate and may need a zero inflated negative binomial distribution, ZINBI(mu,sigma,nu) which is a mixture of zero with probability nu, and a negative binomial distribution NBI(mu,sigma) with probability (1-nu):
m2 <- gamlss(Value ~ Material + Type, sigma.fo =~ Material + Type,
nu.fo =~ Material + Type,family=ZIPBNI)
Alternatively an interaction term may be needed for mu, (and/or sigma or nu), e.g.
m3 <- gamlss(Value ~ Material*Type, sigma.fo =~ Material + Type,
family=ZIP)
Upvotes: 0