klair
klair

Reputation: 41

Prediction intervals for poisson regression on R

I've tryed both the approaches but I find difficulties with both.. I try to explain better which is my problem before I tell you my questions with the two approaches.

I have the dataset "acceptances" in which I have the number of acceptances for everyday in an hospital with the indipendent variables described before. The hospital has three places in which we do the visits..So in my dataset I have 3 rows per day one for every place. The dataset seems like:

Date        Place    NumerAccept    weekday month   NoConvention    Rain

2008-01-02  Place1        203       wed     Gen         0             1
2008-01-02  Place2         70       wed     Gen         0             1
2008-01-02  Place3          9       wed     Gen         0             1
2008-01-03  Place1        345       thu     Gen         0             1
2008-01-03  Place2         24       thu     Gen         0             1
2008-01-03  Place3         99       thu     Gen         0             1
2008-01-04  Place1        339       fri     Gen         0             0
2008-01-04  Place2         36       fri     Gen         0             0
2008-01-04  Place3        101       fri     Gen         0             0

.... and so on... I have the dataset until yesterday, so the last three rows are the acceptances of yesterday 29 of July 2013. Now I do my Poisson regression:

poisson_reg=glm(NumeberAccept ~ 1 + weekday + month + place + NoConvention + Rain, 
                    family = poisson(link = log), data = acceptances)

Now for my predictions I create a new dataset acceptances_2 from which I want to calculate the prediction interval for the Number of Acceptances for the next 2 months!! So the first row will be the number of acceptations today, and the last row will be the acceptances on September 29.


I don't know if this question has already an answer, but I wasn't able to find it. I'm trying to do a Poisson regression in R and I want to obtain the prediction intervals. I saw that the predict function for lm gives it writing 'interval="prediction"' but it doesn't work with predict.glm!

Does someone know if there is a way to have those prediction intervals?? Can you type the code if you have some examples?

So I have to count the number of everyday acceptances in an hospital and I have the following code:

poisson_reg=glm(NumeberAccept ~ 1 + weekday + month + place + NoConvention + Rain, 
                family = poisson(link = log), data = dataset)
summary(poisson_reg)

Now if I type in R predict(poisson_reg, newdata, type="responce") I have the prediction for the number of acceptances for everyday but I need the prediction intervals too! I saw that for an object of class "lm" in the predict call you can write: predict(poisson_reg, newdata, interval="prediction") and it gives the 95% prediction intervals. Is there a way to obtain the same with an object of class "glm"?

Upvotes: 4

Views: 5266

Answers (2)

Ben Bolker
Ben Bolker

Reputation: 226182

This is perhaps more of a statistical question than a programming question, but:

Stealing example data from a previous question:

ex <- read.table(
  header=TRUE, text='
Number.Accepted  Weekday    Month   Place
  20    6   8   1
  16    7   8   1
  12    4   8   2
  11    7   1   1
  12    1   4   1
  12    7   10  2
  13    5   6   2
')
ex.glm <- glm(Number.Accepted ~ Weekday + Month + Place,
              family = poisson, data = ex)

Data frame for which we want prediction intervals:

newdata <- data.frame(Weekday=c(5,6),Month=c(9,9),Place=c(1,1))

Something like this:

bootSimFun <- function(preddata,fit,data) {
    bdat <- data[sample(seq(nrow(data)),size=nrow(data),replace=TRUE),]
    bfit <- update(fit,data=bdat)
    bpred <- predict(bfit,type="response",newdata=preddata)
    rpois(length(bpred),lambda=bpred)
}

You can also use replicate() from base R, but plyr::raply() is convenient:

library(plyr)
set.seed(101)
simvals <- raply(500,bootSimFun(preddata=newdata,fit=ex.glm,data=ex))
t(apply(simvals,2,quantile,c(0.025,0.975)))
##    2.5% 97.5%
## 1 7.000    40
## 2 7.475    36

Upvotes: 5

Gabi
Gabi

Reputation: 1343

Consider the Zelig package. See the Poisson vignette here -- http://rss.acs.unt.edu/Rdoc/library/Zelig/doc/poisson.pdf.

Zelig has a unified approach not just to modeling (for that, glm() with its various link functions would be sufficient) but also to extracting and plotting quantities of interest. In particular, to model the predicted range -- as opposed to just the expected range -- you must simulate both your coefficients (the systematic component) and your error term (the stochastic component). Simply averaging over the error term, which I think is what predict.glm() does, will give you the expected range, which is narrower.

Zelig has a function sim() that simulates both the systematic and the stochastic components, and outputs memory objects that you can use to plot both the predicted and the expected range. It also has a function setx() that you can use before sim() if you want to simulate your prediction uncertainty for given values of your explanatory variables. See here -- http://rss.acs.unt.edu/Rdoc/library/Zelig/html/setx.html.

It all started with this paper: http://gking.harvard.edu/files/abs/making-abs.shtml. Zelig is basically what Clarify grew up into.

Upvotes: 3

Related Questions