Silas Bergen
Silas Bergen

Reputation: 61

Error trying to predict without random effect from bam() output

I have a data set I am trying to fit with bam() in the mgcv package. The model has a binary outcome and I need to specify random intercepts for each animal ID. A subset of the data is below (my actual data is much larger with many more covariates):

dat2 <- read.csv('https://github.com/silasbergen/example_data/raw/main/dat2.csv')
dat2$Animal_id <- factor(dat2$Animal_id)
> head(dat2)
  Animal_id DEM_IA Anyrisk
1       105 279.94       0
2       105 278.68       0
3       106 329.13       0
4       106 329.93       0
5       106 332.25       0
6       106 333.52       0
> summary(dat2)
 Animal_id        DEM_IA         Anyrisk      
 105:     2   Min.   :156.3   Min.   :0.0000  
 106: 83252   1st Qu.:246.8   1st Qu.:0.0000  
 107: 22657   Median :290.1   Median :0.0000  
 108:104873   Mean   :284.8   Mean   :0.3619  
 109:142897   3rd Qu.:318.0   3rd Qu.:1.0000  
 110: 53967   Max.   :411.8   Max.   :1.0000 

I want to fit the model and predict for new data without the random effect:

library(mgcv)
mod <- bam(Anyrisk ~s(Animal_id,bs="re") + s(DEM_IA), data = dat2, family = "binomial",discrete=TRUE)
topred <-  data.frame(DEM_IA = c(280,320))
predict(mod, newdata = topred, exclude="s(Animal_id)",newdata.guaranteed = TRUE)

But this throws an error:

Error in eval(predvars, data, env) : object 'Animal_id' not found

Why would it need Animal_id when I am specifically telling it to exclude this term from the prediction? This is also especially strange as I can run the similar examples in the ?random.effects mgcv help file, no problem, even if I modify those examples to use bam() instead of gam()! Any help would be greatly appreciated!

EDIT

I may have found a fix; apparently if using discrete=TRUE in the bam() model, then predict.bam() also uses discrete=TRUE which will not work with missing random effect, but this works:

mod<- bam(Anyrisk ~s(Animal_id,bs="re") + s(DEM_IA), data = dat2, family = "binomial",discrete=TRUE)
topred <-  data.frame(DEM_IA = c(280,320))
predict(mod,topred, exclude="s(Animal_id)",newdata.guaranteed = TRUE,discrete=FALSE)

Output:

         1          2 
-0.4451066 -0.0285989 

Upvotes: 2

Views: 503

Answers (1)

Ben Bolker
Ben Bolker

Reputation: 226027

tl;dr work around this by putting something in for Animal_id, it doesn't matter what value you specify (not NA though ...)

Why? Can't say for sure without more digging in the code, but ... it's often convenient to use model.frame(formula, newdata) as a step toward computing the required model matrix. (For example, one could proceed by constructing the whole model matrix, then zeroing out columns to be ignored ...) Figuring out which terms can be deleted from the formula may be a separate, more difficult step. (I don't know why it works differently in bam and gam though ...)

This appears to work fine:

topred <-  data.frame(DEM_IA = c(280,320),
                      Animal_id=dat2$Animal_id[1])
predict(mod, newdata = topred, exclude="s(Animal_id)",newdata.guaranteed = TRUE)

Check that it really doesn't matter what you specify for Animal_id:

res <- lapply(levels(dat2$Animal_id),
           function(i) {
             dd <- transform(topred, Animal_id=i)
               predict(mod, newdata = dd, 
                       exclude="s(Animal_id)",newdata.guaranteed = TRUE)
           })
do.call(rbind,res)

Results:

              1          2
[1,] -0.4451066 -0.0285989
[2,] -0.4451066 -0.0285989
[3,] -0.4451066 -0.0285989
[4,] -0.4451066 -0.0285989
[5,] -0.4451066 -0.0285989
[6,] -0.4451066 -0.0285989

Upvotes: 4

Related Questions