Reputation: 61
I have a data set I am trying to fit with bam() in the mgcv package. The model has a binary outcome and I need to specify random intercepts for each animal ID. A subset of the data is below (my actual data is much larger with many more covariates):
dat2 <- read.csv('https://github.com/silasbergen/example_data/raw/main/dat2.csv')
dat2$Animal_id <- factor(dat2$Animal_id)
> head(dat2)
Animal_id DEM_IA Anyrisk
1 105 279.94 0
2 105 278.68 0
3 106 329.13 0
4 106 329.93 0
5 106 332.25 0
6 106 333.52 0
> summary(dat2)
Animal_id DEM_IA Anyrisk
105: 2 Min. :156.3 Min. :0.0000
106: 83252 1st Qu.:246.8 1st Qu.:0.0000
107: 22657 Median :290.1 Median :0.0000
108:104873 Mean :284.8 Mean :0.3619
109:142897 3rd Qu.:318.0 3rd Qu.:1.0000
110: 53967 Max. :411.8 Max. :1.0000
I want to fit the model and predict for new data without the random effect:
library(mgcv)
mod <- bam(Anyrisk ~s(Animal_id,bs="re") + s(DEM_IA), data = dat2, family = "binomial",discrete=TRUE)
topred <- data.frame(DEM_IA = c(280,320))
predict(mod, newdata = topred, exclude="s(Animal_id)",newdata.guaranteed = TRUE)
But this throws an error:
Error in eval(predvars, data, env) : object 'Animal_id' not found
Why would it need Animal_id
when I am specifically telling it to exclude this term from the prediction? This is also especially strange as I can run the similar examples in the ?random.effects
mgcv
help file, no problem, even if I modify those examples to use bam() instead of gam()! Any help would be greatly appreciated!
EDIT
I may have found a fix; apparently if using discrete=TRUE
in the bam()
model, then predict.bam()
also uses discrete=TRUE
which will not work with missing random effect, but this works:
mod<- bam(Anyrisk ~s(Animal_id,bs="re") + s(DEM_IA), data = dat2, family = "binomial",discrete=TRUE)
topred <- data.frame(DEM_IA = c(280,320))
predict(mod,topred, exclude="s(Animal_id)",newdata.guaranteed = TRUE,discrete=FALSE)
Output:
1 2
-0.4451066 -0.0285989
Upvotes: 2
Views: 503
Reputation: 226027
tl;dr work around this by putting something in for Animal_id
, it doesn't matter what value you specify (not NA
though ...)
Why? Can't say for sure without more digging in the code, but ... it's often convenient to use model.frame(formula, newdata)
as a step toward computing the required model matrix. (For example, one could proceed by constructing the whole model matrix, then zeroing out columns to be ignored ...) Figuring out which terms can be deleted from the formula may be a separate, more difficult step. (I don't know why it works differently in bam
and gam
though ...)
This appears to work fine:
topred <- data.frame(DEM_IA = c(280,320),
Animal_id=dat2$Animal_id[1])
predict(mod, newdata = topred, exclude="s(Animal_id)",newdata.guaranteed = TRUE)
Check that it really doesn't matter what you specify for Animal_id
:
res <- lapply(levels(dat2$Animal_id),
function(i) {
dd <- transform(topred, Animal_id=i)
predict(mod, newdata = dd,
exclude="s(Animal_id)",newdata.guaranteed = TRUE)
})
do.call(rbind,res)
Results:
1 2
[1,] -0.4451066 -0.0285989
[2,] -0.4451066 -0.0285989
[3,] -0.4451066 -0.0285989
[4,] -0.4451066 -0.0285989
[5,] -0.4451066 -0.0285989
[6,] -0.4451066 -0.0285989
Upvotes: 4