Reputation: 1
##Here I used the 'discoveries' dataset from the faraway package.
library('faraway')
data("discoveries")
#made a variable 'year' from 1860 to 1959
year = 1860:1959
#here I fit a poisson regression model with discoveries as dependent and year as independent.
fit_pois = glm(discoveries ~ year, data = discoveries, family = poisson)
##The question is, what is the probability that there are 4 discoveries in year '1960'(assuming model is right and is predicting the future). I tried to do this with
pred_pr = predict.glm(fit_pois, data.frame(discoveries = 4, year = 1960, type = 'response'))
##However when I predict on the data, it gives out numbers that are not probabilities. Plz Help!!
Upvotes: 0
Views: 741
Reputation: 174468
Let's start by replicating your model:
library('faraway')
data("discoveries")
year = 1860:1959
fit_pois <- glm(discoveries ~ year, data = discoveries, family = poisson)
Now if we use predict
, our fit_pois
model will tell us the predicted rate of discoveries for any given year(s). It will completely ignore any discoveries
in the data frame passed to the newdata
parameter of predict
, because our model predicts discoveries based solely on the year
variable.
Note also that in your example you have included type = "response"
as a variable in the newdata
data frame, rather than passing it as a parameter to predict
. So the prediction line should look like this:
pred_pr = predict.glm(fit_pois, newdata = data.frame(year = 1960), type = 'response')
And the result we get is:
pred_pr
#> 1
#> 2.336768
Let's think what this means. Since we are doing a Poisson regression, this number represents the expected value of discoveries
in the year 1960. This means that we can estimate the probability of there being exactly 4 discoveries if we examine the Poisson distribution with an expected value (also known as a lambda) of 2.336768. Let's use dpois
to see the probabilities of getting 0 to 6 discoveries if the lambda is 2.336768:
plot(0:6, dpois(0:6, pred_pr), type = "h")
So the probability of there being exactly 4 discoveries in 1960 is:
dpois(4, pred_pr)
#> [1] 0.1200621
i.e. almost exactly 12%
Upvotes: 1