Yang Yang
Yang Yang

Reputation: 900

How to find fitted values from `fitdistrplus` package in R?

I am now using the package fitdistrplusto construct a Gamma distribution and my question is how can I extract the fitted values in order to calculate the root mean squared error? Thanks for any help.

library(fitdistrplus)
Sev = c(1.42,5.15,2.5,2.29,12.36,2.82,1.4,3.53,1.17,1.0,4.03,5.26,1.65,1.41,3.75,1.09,
    3.44,1.36,1.19,4.76,5.58,1.23,2.29,7.71,1.12,1.26,2.78,1.13,3.87,15.43,1.19,
    4.95,7.69,1.17,3.27,1.44,1.05,3.94,1.58,2.29,2.73,3.75,6.80,1.16,1.01,1.00,
    1.02,2.32,2.86,22.90,1.42,1.10,2.78,1.23,1.61,1.33,3.53,10.44)
fg <- fitdist(data = Sev, distr = "gamma", method = "mle") 

Upvotes: 1

Views: 1047

Answers (1)

Julius Vainora
Julius Vainora

Reputation: 48211

This is not a regression context, there are no clear fitted values here. What you may have in mind is estimated density values f(Sev; theta), where theta are the estimates given by fg. That would be

fit <- dgamma(Sev, fg$estimate[1], fg$estimate[2])

and it is a meaningful and well-defined object. However, you get into trouble when trying to compute RMSE: what is going to be the thing that you will compare fit with? What is your sample density value at 1.42? Since you are dealing with a continuous distribution, you are going to have to use some kernel estimator, which again has a parameter - bandwidth! A very crude thing would be

den <- density(Sev)
sqrt(mean((den$y - dgamma(den$x, fg$estimate[1], fg$estimate[2]))^2))
# [1] 0.0146867

That's RMSE between the MLE estimate given by fg and the kernel density estimate den. Using the np package you could estimate the density better than with density.

There is something more sensible that you can do: comparing the empirical CDF of your data and the CDF given by fg. The former is given by empCDF <- ecdf(Sev) and the latter by pgamma with the corresponding parameter values. Then, for instance, the Kolmogorov-Smirnov statistic would be approximately

x <- seq(min(Sev), max(Sev), length = 10000)
max(abs(empCDF(x) - pgamma(x, fg$estimate[1], fg$estimate[2])))
# [1] 0.1725476

and a kind of RMSE would be

sqrt(mean((empCDF(x) - pgamma(x, fg$estimate[1], fg$estimate[2]))^2))
# [1] 0.04585509

(Both statistics can be made more precise with optim and integrate, respectively).

To sum up, since it's not a regression context, things are different and, depending on how rigorous you want to be, there are many alternatives to explore.

Upvotes: 2

Related Questions