Renata Dis
Renata Dis

Reputation: 133

Estimating the parameters for a Pareto type 2 in R

I have a continuous variable that goes from 0 to 1 (percentage data, including 0s), and I want to determine the best distribution to model it. I'm on R-Studio, data in question here. Note that about 27% of observations are 0, and I do plan on exploring zero inflation as I go.

I checked the histogram and ecdf (see below) to get an idea of what I'm dealing with. Fitdistrplus's gave me 'beta', while gamlss gave me a Pareto Type 2, which I'm not very familiar with.

enter image description here enter image description here

I've determined the parameters of a beta distribution and fit it, used KS to test a few other distributions, but a stuck on that Pareto Type 2. The problem: all my atempts at estimating location and scale fail. As far as I can tell, that's because of the zeroes in the dataset. It works if I add a tiny amount to the entire dataset (i.e. 0.0001), but honestly I'm not sure that is a good solution and would make comparing it to anything else a living hell. I tried EnvStats, VGAM, CaDENCE, and all give me errors. So, I humbly come here in the hopes that someone can suggest another option for estimating the Pareto Type 2 parameters for that dataset.

Upvotes: 0

Views: 254

Answers (1)

Emmanuel Hamel
Emmanuel Hamel

Reputation: 2223

You can consider the following approach :

library(DEoptim)

df <- read.csv("percentData.csv")
data <- unlist(df)

log_Lik <- function(data, param)
{
  x <- data
  k <- param[1]
  s <- param[2]
  log_Lik <- sum(log(k/(s + x) * (s / (s + x)) ^ k)) 
  return(-log_Lik)
}

obj_Res <- DEoptim(fn = log_Lik, lower = c(0, 0), upper = c(1000, 1000), data = data, control = list(parallelType = 1))
obj_Res$optim$bestmem

Upvotes: 1

Related Questions