Chuan
Chuan

Reputation: 71

Error in fitdist with gamma distribution

Below are my codes:

library(fitdistrplus)

s <- c(11, 4, 2, 9, 3, 1, 2, 2, 3, 2, 2, 5, 8,3, 15, 3, 9, 22, 0, 4, 10, 1, 9, 10, 11,
 2, 8, 2, 6, 0, 15, 0 , 2, 11, 0, 6, 3, 5, 0, 7, 6, 0, 7, 1, 0, 6, 4, 1, 3, 5,
 2, 6, 0, 10, 6, 4, 1, 17, 0, 1, 0, 6, 6, 1, 5, 4, 8, 0, 1, 1, 5, 15, 14, 8, 1,
 3, 2, 9, 4, 4, 1, 2, 18, 0, 0, 10, 5, 0, 5, 0, 1, 2, 0, 5, 1, 1, 2, 3, 7)

o <- fitdist(s, "gamma", method = "mle")
summary(o)
plot(o)

and the error says:

Error in fitdist(s, "gamma", method = "mle") : the function mle failed to estimate the parameters, with the error code 100

Upvotes: 7

Views: 7278

Answers (2)

Ben Bolker
Ben Bolker

Reputation: 226991

The Gamma distribution doesn't allow zero values (the likelihood will evaluate to zero, and the log-likelihood will be infinite, for a response of 0) unless the shape parameter is exactly 1.0 (i.e., an exponential distribution - see below) ... that's a statistical/mathematical problem, not a programming problem. You're going to have to find something sensible to do about the zero values. Depending on what makes sense for your application, you could (for example)

  • choose a different distribution to test (e.g. pick a censoring point and fit a censored Gamma, or fit a zero-inflated Gamma distribution, or ...)
  • exclude the zero values (fitdist(s[s>0], ...))
  • set the zero values to some sensible non-zero value (fitdist(replace(s,which(s==0),0.1),...)

which (if any) of these is best depends on your application.


@Sandipan Dey's first answer (leaving the zeros in the data set) appears to make sense, but in fact it gets stuck at the shape parameter equal to 1.

o <- fitdist(s, "exp", method = "mle")

gives the same answer as @Sandipan's code (except that it estimates rate=0.2161572, the inverse of the scale parameter=4.626262 that's estimated for the Gamma distribution - this is just a change in parameterization). If you choose to fit an exponential instead of a Gamma, that's fine - but you should do it on purpose, not by accident ...

To illustrate that the zeros-included fit may not be working as expected, I'll construct my own negative log-likelihood function and display the likelihood surface for each case.

mfun <- function(sh,sc,dd=s) {
  -sum(dgamma(dd,shape=sh,scale=sc,log=TRUE))
}

library(emdbook)  ## for curve3d() helper function

Zeros-included surface:

cc1 <- curve3d(mfun(x,y),
      ## set up "shape" limits" so we evaluate
      ## exactly shape=1.000 ...
        xlim=c(0.55,3.55),
        n=c(41,41),
        ylim=c(2,5),
        sys3d="none")


png("gammazero1.png")
with(cc1,image(x,y,z))
dev.off()

enter image description here

In this case the surface is only defined at shape=1 (i.e. an exponential distribution); the white regions represent infinite log-likelihoods. It's not that shape=1 is the best fit, it's that it's the only fit ...

Zeros-excluded surface:

cc2 <- curve3d(mfun(x,y,dd=s[s>0]),
               ## set up "shape" limits" so we evaluate
               ## exactly shape=1.000 ...
               xlim=c(0.55,3.55),
               n=c(41,41),
               ylim=c(2,5),
               sys3d="none")


png("gammazero2.png")
with(cc2,image(x,y,z))
with(cc2,contour(x,y,z,add=TRUE))
abline(v=1.0,lwd=2,lty=2)
dev.off()

enter image description here

Upvotes: 6

Sandipan Dey
Sandipan Dey

Reputation: 23139

Just provide the initial values for the gamma distribution parameters (scale, shape) to be computed with mle using optim and also the lower bounds for the parameters, it should work.

o <- fitdist(s, "gamma", lower=c(0,0), start=list(scale=1,shape=1))
summary(o)

#Fitting of the distribution ' gamma ' by maximum likelihood 
#Parameters : 
#      estimate Std. Error
#scale 4.626262         NA
#shape 1.000000         NA
#Loglikelihood:  -250.6432   AIC:  505.2864   BIC:  510.4766 

enter image description here

As per the comments by @Ben Bolker, we may want to exclude the zero points first:

o <- fitdist(s[s!=0], "gamma", method = "mle", lower=c(0,0), start=list(scale=1,shape=1))
summary(o)
#Fitting of the distribution ' gamma ' by maximum likelihood 
#Parameters : 
#      estimate Std. Error
#scale 3.401208         NA
#shape 1.622378         NA
#Loglikelihood:  -219.6761   AIC:  443.3523   BIC:  448.19

enter image description here

Upvotes: 0

Related Questions