Jae Lee
Jae Lee

Reputation: 1

(R)Discovery dataset from faraway package: How can you make predictions with both dependent and independent variables? Poission Regression

##Here I used the 'discoveries' dataset from the faraway package.

library('faraway')
data("discoveries")

#made a variable 'year' from 1860 to 1959

year = 1860:1959

#here I fit a poisson regression model with discoveries as dependent and year as independent.

fit_pois =  glm(discoveries ~ year, data = discoveries, family = poisson)

##The question is, what is the probability that there are 4 discoveries in year '1960'(assuming model is right and is predicting the future). I tried to do this with

pred_pr = predict.glm(fit_pois, data.frame(discoveries = 4, year = 1960, type = 'response'))

##However when I predict on the data, it gives out numbers that are not probabilities. Plz Help!!

Upvotes: 0

Views: 741

Answers (1)

Allan Cameron
Allan Cameron

Reputation: 174468

Let's start by replicating your model:

library('faraway')
data("discoveries")

year = 1860:1959
fit_pois <- glm(discoveries ~ year, data = discoveries, family = poisson)

Now if we use predict, our fit_pois model will tell us the predicted rate of discoveries for any given year(s). It will completely ignore any discoveries in the data frame passed to the newdata parameter of predict, because our model predicts discoveries based solely on the year variable.

Note also that in your example you have included type = "response" as a variable in the newdata data frame, rather than passing it as a parameter to predict. So the prediction line should look like this:

pred_pr = predict.glm(fit_pois, newdata = data.frame(year = 1960), type = 'response')

And the result we get is:

pred_pr
#>        1 
#> 2.336768

Let's think what this means. Since we are doing a Poisson regression, this number represents the expected value of discoveries in the year 1960. This means that we can estimate the probability of there being exactly 4 discoveries if we examine the Poisson distribution with an expected value (also known as a lambda) of 2.336768. Let's use dpois to see the probabilities of getting 0 to 6 discoveries if the lambda is 2.336768:

plot(0:6, dpois(0:6, pred_pr), type = "h")

So the probability of there being exactly 4 discoveries in 1960 is:

dpois(4, pred_pr)
#> [1] 0.1200621

i.e. almost exactly 12%

Upvotes: 1

Related Questions