Reputation: 3
How do you predict in mgcv::gam
when you've fitted a model that might contain random effects?
The other thread on this site with the "exclude" trick does not work for me (https://stats.stackexchange.com/questions/131106/predicting-with-random-effects-in-mgcv-gam)
ya <- rnorm(100, 0, 1)
yb <- rnorm(100,0,1.5)
yc <- rnorm(100, 0, 2)
yd <- rnorm(100, 0, 2.5)
yy <- c(ya,yb,yc,yd) #so, now we've got data from 4 different groups.
xx <- c(rep("a", 100), rep("b",100), rep("c",100),rep("d",100)) #groups
zz <- rnorm(400,0,1) #some other covariate
model <- gam(yy ~ zz + s(xx, bs = "re")) #the model
predictdata <- data.frame( zz = 5 ) #new data
predict(model, newdata = predictdata, exclude = "s(xx)") #prediction
and this produces the error
Error in model.frame.default(ff, data = newdata, na.action = na.act) :
variable lengths differ (found for 'xx')
In addition: Warning messages:
1: In predict.gam(model, newdata = predictdata, exclude = "s(xx)") :
not all required variables have been supplied in newdata!
2: 'newdata' had 1 row but variables found have 400 rows
My mgcv package is the latest.
EDIT:
If you change predictdata to
predictdata <- data.frame(zz = 5, xx = "f")
then it says
Error in predict.gam(model, newdata = predictdata, exclude = "s(xx)") :
f not in original fit
Upvotes: 0
Views: 2666
Reputation: 140
I experimented with your example and it seems that the 'exclude' statement does work, even though you have to specify in newdata values for the random effects that were included in the original dataset used to fit the model. This however, makes me a bit uneasy. Another caveat is that 'exclude' did not seem to work on a model with a variance structure that was estimated separately by group (I tried this with another dataset), i.e., something like s(xx, s="re", by=group). You might want to post or have the question moved to crossvalidated so that other statisticians/analysts can see it an perhaps provide a better answer.
Below is my code. Note that I changed the means for groups a and d, yet the overall mean should be around zero.
ya <- rnorm(100, 1, 1)
yb <- rnorm(100, 0,1.5)
yc <- rnorm(100, 0, 2)
yd <- rnorm(100, -1, 2.5)
yy <- c(ya,yb,yc,yd) #so, now we've got data from 4 different groups.
xx <- c(rep("a", 100), rep("b",100), rep("c",100),rep("d",100)) #groups
zz <- rnorm(400,0,1) #some other covariate
some.data= data.frame(yy,xx,zz)
model <- gam(yy ~ zz + s(xx, bs = "re"),data=some.data) #the model
# the intercept is the overall mean when zz is zero
summary(model)
predictdata <- data.frame(zz = c(0,0,0,0), xx =c("a","b","c","d")) #new data
#excluding random effects. Estimate should be the same for all and should be the intercept
predict(model, newdata = predictdata, exclude = "s(xx)")
#including random effects. Estimates should differ by group with 'a' larger and 'd' smaller
predict(model, newdata = predictdata)
Upvotes: 1