Reputation: 900
I am now using the package fitdistrplus
to construct a Gamma distribution and my question is how can I extract the fitted values in order to calculate the root mean squared error? Thanks for any help.
library(fitdistrplus)
Sev = c(1.42,5.15,2.5,2.29,12.36,2.82,1.4,3.53,1.17,1.0,4.03,5.26,1.65,1.41,3.75,1.09,
3.44,1.36,1.19,4.76,5.58,1.23,2.29,7.71,1.12,1.26,2.78,1.13,3.87,15.43,1.19,
4.95,7.69,1.17,3.27,1.44,1.05,3.94,1.58,2.29,2.73,3.75,6.80,1.16,1.01,1.00,
1.02,2.32,2.86,22.90,1.42,1.10,2.78,1.23,1.61,1.33,3.53,10.44)
fg <- fitdist(data = Sev, distr = "gamma", method = "mle")
Upvotes: 1
Views: 1047
Reputation: 48211
This is not a regression context, there are no clear fitted values here. What you may have in mind is estimated density values f(Sev; theta), where theta are the estimates given by fg
. That would be
fit <- dgamma(Sev, fg$estimate[1], fg$estimate[2])
and it is a meaningful and well-defined object. However, you get into trouble when trying to compute RMSE: what is going to be the thing that you will compare fit
with? What is your sample density value at 1.42? Since you are dealing with a continuous distribution, you are going to have to use some kernel estimator, which again has a parameter - bandwidth! A very crude thing would be
den <- density(Sev)
sqrt(mean((den$y - dgamma(den$x, fg$estimate[1], fg$estimate[2]))^2))
# [1] 0.0146867
That's RMSE between the MLE estimate given by fg
and the kernel density estimate den
. Using the np
package you could estimate the density better than with density.
There is something more sensible that you can do: comparing the empirical CDF of your data and the CDF given by fg
. The former is given by empCDF <- ecdf(Sev)
and the latter by pgamma
with the corresponding parameter values. Then, for instance, the Kolmogorov-Smirnov statistic would be approximately
x <- seq(min(Sev), max(Sev), length = 10000)
max(abs(empCDF(x) - pgamma(x, fg$estimate[1], fg$estimate[2])))
# [1] 0.1725476
and a kind of RMSE would be
sqrt(mean((empCDF(x) - pgamma(x, fg$estimate[1], fg$estimate[2]))^2))
# [1] 0.04585509
(Both statistics can be made more precise with optim
and integrate
, respectively).
To sum up, since it's not a regression context, things are different and, depending on how rigorous you want to be, there are many alternatives to explore.
Upvotes: 2