Michal
Michal

Reputation: 11

Fitting empirical distribution to theoretical ones with R with bounds of parameters?

INTRODUCTION: I have a list of 10 numbers. I would like to fit genpareto.

plot of shapes

PROBLEM: model has been math with not provide shape ( green shape). I would like to put bounds on shape parameter during fit. Based on shape parameter the curve get the different effect. From the automated results we get value of shape parameter smaller than zero ( green shape). I am looking a way how to get the red shape of curve. The matched parameter should be interval (0,0.5). The code can be found below:

library("fitdistrplus")
library("actuar")
dataPar=c(17.5, 17.8, 22.4, 22.6 ,25, 25.2, 27.5, 30, 32.4, 32.5)
fitdist(dataPar, "genpareto", lower=c(-Inf, 0, -Inf), start=list(shape1 = min(dataPar), shape2 = 1,  rate = 1), upper=c(+Inf, 0.5, +Inf))

Upvotes: 0

Views: 244

Answers (1)

Avraham
Avraham

Reputation: 1719

There are a few significant problems with your question:

  1. The generalized Pareto of Klugman, Panjer, and Willmot—which is what Goulet uses in actuar—is a three-parameter distribution. You are not going to get a good fit with only 10 data points.
  2. There are two shape parameters. In R, shape1 corresponds to α and shape2 to τ. You are not making clear which to which shape parameter you refer.
  3. The shape parameters are a constant. Plotting a curve for shape does not make any sense. What you can plot is the CDF based on the fitted shape and scale parameters.
  4. Your curve seems to be plotting a survival function, not a CDF, as it decreases as the probability increases.

That being said, here is a possible example of what you intended to do.

For example:

# Data
dataPar <- c(17.5, 17.8, 22.4, 22.6 ,25, 25.2, 27.5, 30, 32.4, 32.5)

# Negative Loglikelihood function
nLL <- function(par, data) {
  -sum(actuar::dgenpareto(data, par[1L], par[2L], scale = par[3L], log = TRUE))
}

# General-purpose robust non-linear optimization package
library(nloptr)

fit <- nloptr(c(1, 2, 3), eval_f = nLL, data = dataPar,
              opts = list(algorithm = "NLOPT_LN_NELDERMEAD", maxeval = 1e5L))

fit

Call:
nloptr(x0 = c(1, 2, 3), eval_f = nLL, opts = list(algorithm = "NLOPT_LN_NELDERMEAD", 
    maxeval = 100000L), data = dataPar)


Minimization using NLopt version 2.7.1 

NLopt solver status: 4 ( NLOPT_XTOL_REACHED: Optimization stopped because xtol_rel or xtol_abs 
(above) was reached. )

Number of Iterations....: 1185 
Termination conditions:  maxeval: 100000 
Number of inequality constraints:  0 
Number of equality constraints:    0 
Optimal value of objective function:  30.5855340735935 
Optimal value of controls: 571935643 23.56514 613707172


# Create y elements
yy <- seq(17.5, 35, 0.5)
# Calculate survival function at each y element for x value
xx <- actuar::pgenpareto(yy, fit$solution[1], fit$solution[2], scale = fit$solution[3], lower.tail = FALSE)
# Plot similar to OP
plot(xx, yy, type = "o", ylim = c(17.5, 35))

# Add rough empirical survival by plotting dataPar against 0.95, 0.85, etc.
ecdfDP <- seq(0.95, 0.05, -0.1)
points(ecdfDP, dataPar, col = "red", pch = 16L)

Plot of Survival Function

The plot looks similar to what you posted, and the empirical data points are similar to the fitted survival, which indicates a reasonable fit. However, the parameters are very large, possibly diverging, which indicates that the Generalized Pareto is not an appropriate family for this data. Consider a more parsimonious distribution like the Pareto or the Weibull, if not an exponential.

Upvotes: 0

Related Questions