Reputation: 83
I would like to define my own distributions to use with the fitdistrplus function to fit my monthly precipitation data from now on refered as "month". I am using the “lmomco” function to help me define the distributions, but cannot manage to make it work. For example, I am defining the generalized extreme value (gev) distribution like the following:
dgev<-pdfgev #functions which are included in lmomco
pgev<-cdfgev
qgev<-quagev
Since "fitdistrplus" needs the argument "start", which consists of the initial parameter values for the desired distribution, I am estimating these initial values as the following:
lmom=lmoms(month,nmom=5) #from lmomco package
para=pargev(lmom, checklmom=TRUE)
Now, I finally try using the “fitdist” function to fit “month” to the gev distribution as:
fitgev <- fitdist(month, "gev", start=para[2]) #fitdistrplus
I get an error like the one below. It does not matter which distribution I define with the help of “lmomco”, I get the same error. Could someone give me a hint on what am I doing wrong? Thanks!
fitgev <- fitdist(month, "gev", start=para[2])
[1] "Error in dgev(c(27.6, 97.9, 100.6, 107.3, 108.5, 109, 112.4, 120.9, 137.8, : \n unused arguments (para.xi = 196.19347977195, para.alpha = 91.9579520442104, para.kappa = -0.00762962879097294)\n"
attr(,"class")
[1] "try-error"
attr(,"condition")
<simpleError in dgev(c(27.6, 97.9, 100.6, 107.3, 108.5, 109, 112.4, 120.9, 137.8, 138.4, 144.7, 156.8, 163.1, 168.9, 169.1, 171.4, 176.1, 177.1, 178.8, 178.9, 187.2, 190.2, 190.5, 190.8, 191.2, 193.1, 195.2, 198.5, 199.8, 201.7, 206.9, 213.4, 220.7, 240, 253.5, 254.5, 256.1, 256.4, 257.5, 258.3, 261.5, 263.7, 264.7, 279.1, 284.2, 313.1, 314.7, 319.4, 321.6, 328.9, 330.1, 332.2, 358.3, 366.8, 367.9, 403.5, 424.1, 425.9, 457.3, 459.7, 468, 497.1, 508.5, 547.1), para.xi = 196.19347977195, para.alpha = 91.9579520442104, para.kappa = -0.00762962879097294): unused arguments (para.xi = 196.19347977195, para.alpha = 91.9579520442104, para.kappa = -0.00762962879097294)>
Error in fitdist(month, "gev", start = para[2]) :
the function mle failed to estimate the parameters,
with the error code 100
Upvotes: 8
Views: 6186
Reputation: 226182
tl;dr this is fussy, and will probably always be fussy - fitting potentially unstable distributions to extremely small, noisy data sets, is just plain hard. I outline some strategies below that will get us an answer, but I don't really trust any of the answers I'm getting.
For the specific case here, @BelSmek's answer is best: evd::fgev(month)
gives answers matching mle2
/DEoptim
below, and gives much more plausible standard error estimates. However, all of the machinations below may be useful stuff for people trying to fit parameters to distributions in general ...
fitdist
is expecting a density/distribution function with named arguments, and a bunch more; we can make this work, although as I said I don't trust the answers.
library("lmomco")
library("fitdistrplus")
## reproducible:
month <- c(27.6, 97.9, 100.6, 107.3, 108.5,
109, 112.4, 120.9, 137.8)
Setup:
lmom <- lmoms(month,nmom=5) #from lmomco package
para <- pargev(lmom, checklmom=TRUE)
It turns out we need to redefine dgev
with quite a few additional bits of plumbing to make everyone happy:
pgev <- function(q, xi, alpha, kappa) {
if (length(q) == 0) return(numeric(0))
r <- try(cdfgev(x = q, para = c(xi = xi, alpha = alpha, kappa = kappa)),
silent = TRUE)
if (inherits(r, "try-error")) return(rep(NaN, length(q)))
r
}
dgev <- function(x,xi,alpha,kappa, minval = 1e-8) {
r <- pdfgev(x,list(type="gev",para=c(xi,alpha,kappa),source="pargev"))
r[r==0] <- minval
r
}
Possibly the most important thing here, besides changing the arguments from a vector to a list, is intercepting cases where the density function underflows to zero and replacing them by a small value. This is a hack that won't always work: the more principled approach is for the density function to compute the log-density directly (I'll try this below, although in this case it doesn't help that much).
fitgev <- fitdist(month, "gev", start=as.list(para[[2]]))
We get an answer ...
Parameters:
estimate Std. Error
xi 104.060486 0.0004131185
alpha 39.227041 0.0004150259
kappa 1.162644 0.0004105323
... but I don't trust this at all because the standard errors are unrealistically low (why would we think we could estimate the parameters this precisely when fitting a 3-parameter model to 9 data points ... ?)
An alternative approach uses bbmle::mle2
in conjunction with evd::dgev
— the latter does have a log
argument ...
## clean up
rm(dgev)
detach("package:lmomco")
## get new packages
library(evd)
library(bbmle)
(in general it would be better to start a fresh R session here ...)
I again had to wrap the dgev
function to substitute in something for impossible values (even though we're now working on the log scale, so things are somewhat more stable ...)
dgev <- function(..., log = FALSE, minval = 1e-8) {
r <- evd::dgev(..., log = log)
if (log) {
r[r == -Inf] <- log(minval)
}
r
}
fit2 <- mle2(month ~ dgev(loc = xi, scale = alpha, shape = kappa),
data = data.frame(month),
start = as.list(para[[2]]))
summary(fit2)
Note that the standard errors are now slightly more reasonable, but still surprisingly small, and that these answers are completely different from those we got from fitdistrplus
.
Coefficients:
Estimate Std. Error z value Pr(z)
xi 99.6720328 0.0765906 1301.36 < 2.2e-16 ***
alpha 30.7447099 0.3027090 101.57 < 2.2e-16 ***
kappa -0.7763013 0.0076273 -101.78 < 2.2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
-2 log L: 82.063
As a final brute-force approach, we'll try differential evolution
dgev_lik <- function(pars, minval = 1e-8) {
r <- evd::dgev(month, pars[1], pars[2], pars[3], log = TRUE)
r[r == -Inf] <- log(minval)
-1*sum(r)
}
library(DEoptim)
set.seed(101)
d1 <- DEoptim(dgev_lik, lower = c(90, 10, -2),
upper = c(130, 50, 2),
control = DEoptim.control(NP = 1000, itermax = 1000))
d1$optim
$bestmem
par1 par2 par3
99.6299712 30.7704978 -0.7762563
$bestval
[1] 41.03149
This is basically the same answer that mle2
got.
Looking at the guts of fitgev
, it claims to have a better log-likelihood than mle2
(logLik(fitgev)
is -36.9, vs. -41 for mle2
/DEoptim
), but it appears to be computing a non-comparable value: plugging the fitgev
parameters directly into our log-likelihood function gives a much worse answer (for negative loglikelihoods, higher values are worse ...)
dgev_lik(fitgev$estimate) ## 57.39
Upvotes: 7
Reputation: 11
here is another solution, if the example provided no longer works:
library(evd)
fitgev <- fgev(month)
# e.g. extract log-likelihood
logLik(fitgev)[[1]]
Upvotes: 1
Reputation: 19
Make sure the argument in you cumulative function has variable q: pgev(q, par1, par2)
instead of
pgev(x, par1, par2)
Because the error message essentially tells you that it couldn't find the variable q.
The keypoint is to use:
x
as pdf input
;q
as cdf input
;p
as inverse cdf input
For example, fitting a Gumble Distribution defined by yourself
# Data
x1 <- c(6.4,13.3,4.1,1.3,14.1,10.6,9.9,9.6,15.3,22.1,13.4,
13.2,8.4,6.3,8.9,5.2,10.9,14.4)
# Define pdf, cdf , inverse cdf
dgumbel <- function(x,a,b) 1/b*exp((a-x)/b)*exp(-exp((a-x)/b))
pgumbel <- function(q,a,b) exp(-exp((a-q)/b))
qgumbel <- function(p,a,b) a-b*log(-log(p))
# Fit with MLE
f1c <- fitdist(x1,"gumbel",start=list(a=10,b=5))
# Goodness of Fit
gofstat(f1c, fitnames = 'Gumbel MLE')
Reference: https://www.rdocumentation.org/packages/fitdistrplus/versions/0.2-1/topics/fitdist
Upvotes: 1