Reputation: 71
Below are my codes:
library(fitdistrplus)
s <- c(11, 4, 2, 9, 3, 1, 2, 2, 3, 2, 2, 5, 8,3, 15, 3, 9, 22, 0, 4, 10, 1, 9, 10, 11,
2, 8, 2, 6, 0, 15, 0 , 2, 11, 0, 6, 3, 5, 0, 7, 6, 0, 7, 1, 0, 6, 4, 1, 3, 5,
2, 6, 0, 10, 6, 4, 1, 17, 0, 1, 0, 6, 6, 1, 5, 4, 8, 0, 1, 1, 5, 15, 14, 8, 1,
3, 2, 9, 4, 4, 1, 2, 18, 0, 0, 10, 5, 0, 5, 0, 1, 2, 0, 5, 1, 1, 2, 3, 7)
o <- fitdist(s, "gamma", method = "mle")
summary(o)
plot(o)
and the error says:
Error in fitdist(s, "gamma", method = "mle") : the function mle failed to estimate the parameters, with the error code 100
Upvotes: 7
Views: 7278
Reputation: 226991
The Gamma distribution doesn't allow zero values (the likelihood will evaluate to zero, and the log-likelihood will be infinite, for a response of 0) unless the shape parameter is exactly 1.0 (i.e., an exponential distribution - see below) ... that's a statistical/mathematical problem, not a programming problem. You're going to have to find something sensible to do about the zero values. Depending on what makes sense for your application, you could (for example)
fitdist(s[s>0], ...)
)fitdist(replace(s,which(s==0),0.1),...)
which (if any) of these is best depends on your application.
@Sandipan Dey's first answer (leaving the zeros in the data set) appears to make sense, but in fact it gets stuck at the shape parameter equal to 1.
o <- fitdist(s, "exp", method = "mle")
gives the same answer as @Sandipan's code (except that it estimates rate=0.2161572, the inverse of the scale parameter=4.626262 that's estimated for the Gamma distribution - this is just a change in parameterization). If you choose to fit an exponential instead of a Gamma, that's fine - but you should do it on purpose, not by accident ...
To illustrate that the zeros-included fit may not be working as expected, I'll construct my own negative log-likelihood function and display the likelihood surface for each case.
mfun <- function(sh,sc,dd=s) {
-sum(dgamma(dd,shape=sh,scale=sc,log=TRUE))
}
library(emdbook) ## for curve3d() helper function
Zeros-included surface:
cc1 <- curve3d(mfun(x,y),
## set up "shape" limits" so we evaluate
## exactly shape=1.000 ...
xlim=c(0.55,3.55),
n=c(41,41),
ylim=c(2,5),
sys3d="none")
png("gammazero1.png")
with(cc1,image(x,y,z))
dev.off()
In this case the surface is only defined at shape=1 (i.e. an exponential distribution); the white regions represent infinite log-likelihoods. It's not that shape=1 is the best fit, it's that it's the only fit ...
Zeros-excluded surface:
cc2 <- curve3d(mfun(x,y,dd=s[s>0]),
## set up "shape" limits" so we evaluate
## exactly shape=1.000 ...
xlim=c(0.55,3.55),
n=c(41,41),
ylim=c(2,5),
sys3d="none")
png("gammazero2.png")
with(cc2,image(x,y,z))
with(cc2,contour(x,y,z,add=TRUE))
abline(v=1.0,lwd=2,lty=2)
dev.off()
Upvotes: 6
Reputation: 23139
Just provide the initial values for the gamma distribution parameters (scale, shape) to be computed with mle
using optim
and also the lower bounds for the parameters, it should work.
o <- fitdist(s, "gamma", lower=c(0,0), start=list(scale=1,shape=1))
summary(o)
#Fitting of the distribution ' gamma ' by maximum likelihood
#Parameters :
# estimate Std. Error
#scale 4.626262 NA
#shape 1.000000 NA
#Loglikelihood: -250.6432 AIC: 505.2864 BIC: 510.4766
As per the comments by @Ben Bolker, we may want to exclude the zero points first:
o <- fitdist(s[s!=0], "gamma", method = "mle", lower=c(0,0), start=list(scale=1,shape=1))
summary(o)
#Fitting of the distribution ' gamma ' by maximum likelihood
#Parameters :
# estimate Std. Error
#scale 3.401208 NA
#shape 1.622378 NA
#Loglikelihood: -219.6761 AIC: 443.3523 BIC: 448.19
Upvotes: 0