Reputation: 1
I want to predict a fitted poisson glm on newdata, given that it was fitted using offset=log(Exposure), but I get confused with the inclusion of the term "offset" inside of predict.glm(). I do not understand if it is necessary to include it and, if I want to include it, how should I do that with an equal impact as the offset used in fitting?
To clear my doubts, I created a dataset from scratch and I decided to compare three possible approaches:
However, when comparing the results, I was surprised to find all three approaches resulted in the same predictions. Conclusion: offset term in predict.glm() is ignored? I still ask this, because I am a new to R and my procedure to check it may be wrong.
Also, even though similar questions may have been posted, answers are scarce and inconsistent with this result.
set.seed(1)
exp<-c(1:100)/100
dummy<-rep(c(0,1), each=50)
y<-floor(rpois(100,dummy*exp*10)) #count data
DTtest<-as.data.frame(exp)
DTtest$D<-dummy
DTtest$y<-y
gtest<-glm(y~D, offset=log(exp), family=poisson(link="log"), data=DTtest, trace=TRUE)
DTtest1<-predict(gtest, newdata=DTtest, type="response", offset=log(exp)) #predict1
DTtest2<-predict(gtest, newdata=DTtest, type="response", offset=exp) #predict2
DTtest3<-predict(gtest, newdata=DTtest, type="response") #predict3
sum(DTtest1==DTtest2)
sum(DTtest2==DTtest3)
sum(DTtest1==DTtest3)
As a result for the sums I got: 100, 100, 100, which is equal to the length of the data used to predict (I also used DTtest as newdata). This means all predictions have the same value between the 3 approaches.
I was expecting the values to differ due to different impact of the offset explicit in predict(), but the result says otherwise.
Upvotes: 0
Views: 190
Reputation: 226577
Yes, offset
is ignored.
The help package for ?predict.glm
says only:
...
: further arguments passed to or from other methods.
This is not very helpful. However, if we look at the code for predict.glm
we don't see ...
used anywhere (it would be passed to another function). (If you insist on a code-based solution, we can search like this:
any(grepl("...", deparse(body(stats:::predict.glm)), fixed = TRUE))
## FALSE
If you wanted to predict with other values of the offset you would set up a newdata
data frame with different values of exp
. (By the way, it's probably a bad idea to use exp
as a variable name since it's the same as the name of the built-in exponential function ... most of the time you can get away with this but occasionally it will cause confusing errors.)
Upvotes: 1