Reputation: 108
I need to do exactly what @interstellar asked here Fit poisson distribution to data but within the R environment (not matlab).
So, I created a barplot with my observed values and I just need to fit a poisson distribution on it.
Here my data:
df = read.table(text = 'Var1 Freq
6 1
7 2
8 5
9 7
10 9
11 6
12 4
13 3
14 2
15 1', header = TRUE)
the barplot created is the following:
t = barplot(df$Freq, ylim = c(0,10))
axis(1, at=t, labels=df$Var1)
I am still new to R so how I could use the fitdist
function or something else to create a line above my barplot?
Any help would be really appreciated.
UPDATE
I have worked out something but I am not sure 100% if it is correct:
#create barplot
t = barplot(df$Freq, ylim = c(0,10))
axis(1, at=t, labels=df$Var1)
#find lambda value from my data
pois = fitdist(df$Freq, 'pois', method = 'mle')
print(pois)
#result
Fitting of the distribution ' pois ' by maximum likelihood
Parameters:
estimate Std. Error
lambda 4 0.6324555
#create 10 values from a real poisson distribution
dist = dpois(1:10, lambda = 4)
#multiply them by `sum(df$Freq)` in order to scale them to the barplot
dist = dist * sum(df$Freq)
#add the line plot to the original barplot
lines(dist, lwd = 2)
However, the curve is not smooth..
Upvotes: 1
Views: 7536
Reputation: 17193
The package vcd
comes with the goodfit()
function which essentially does exactly what you ask for: fit the model by ML and then visualize observed and fitted frequencies. By default, a square-root scale is adopted to better bring out departures at lower expected frequencies. Also, by default, the bars are hanging from the curve to align all deviations along the axis. This version is called rootogram (see our recent discussion in The American Statistician for more details). The defaults can be changed though to get a standing barplot on the raw scale:
gf <- goodfit(df[, 2:1], "poisson")
plot(gf, type = "standing", scale = "raw")
plot(gf, type = "hanging", scale = "sqrt")
Attention: Also note that in your version of the code you obtain exactly 4
as the MLE because you just use $Freq
in the estimation but not $Var1
. My version of the code assumes that your data is meant to have 1 observation of a 6, 2 observations of a 7, etc. The code may have to be adapted if this is not what you mean.
Upvotes: 3
Reputation: 108
#fit poisson distr to tbl$Freq data
poisson = fitdist(df$Freq, 'pois', method = 'mle')
print(poisson)
#plot
plot(df$Var1, df$Freq, type = 'h', ylim = c(0,10), ylab = 'No. of years with x events',
xlab = 'No. of events in a year', main = 'All 13-day events with Poisson')
dist = dpois(1:10, lambda = 4)
dist = dist * sum(df$Freq)
dist = as.data.frame(dist)
dist$Var1 = df$Var1
lines(dist$Var1, dist$dist, lwd = 2)
Upvotes: 0