Reputation: 2666
I am fitting a model using a random site-level effect using a generalized additive model, implemented in the mgcv
package for R. I had been doing this using the function gam()
however, to speed things up I need to shift to the bam()
framework, which is basically the same as gam()
, but faster. I further sped up fitting by passing the options bam(nthreads = N, discrete=T)
, where nthreads
is the number of cores on my machine. However, when I use the discretization option, and then try to make predictions with my model on new data, while ignoring the random effect, I consistent get an error.
Here is code to generate example data and reproduce the error.
library(mgcv)
#generate data.
N <- 10000
x <- runif(N,0,1)
y <- (0.5*x / (x + 0.2)) + rnorm(N)*0.1 #non-linear relationship between x and y.
#uninformative random effect.
random.x <- as.factor(do.call(paste0, replicate(2, sample(LETTERS, N, TRUE), FALSE)))
#fit models.
fit1 <- gam(y ~ s(x) + s(random.x, bs = 're')) #this one takes ~1 minute to fit, rest faster.
fit2 <- bam(y ~ s(x) + s(random.x, bs = 're'))
fit3 <- bam(y ~ s(x) + s(random.x, bs = 're'), discrete = T, nthreads = 2)
#make predictions on new data.
newdat <- data.frame(runif(200, 0, 1))
colnames(newdat) <- 'x'
test1 <- predict(fit1, newdata=newdat, exclude = c("s(random.x)"), newdata.guaranteed = T)
test2 <- predict(fit2, newdata=newdat, exclude = c("s(random.x)"), newdata.guaranteed = T)
test3 <- predict(fit3, newdata=newdat, exclude = c("s(random.x)"), newdata.guaranteed = T)
Making predictions with the third model which uses discretization throws this error (which the other two do not):
Error in model.frame.default(object$dinfo$gp$fake.formula[-2], newdata) :
variable lengths differ (found for 'random.x')
In addition: Warning message:
'newdata' had 200 rows but variables found have 10000 rows
How can I go about making predictions for a new dataset using the model fit with discretization?
Upvotes: 2
Views: 318
Reputation: 174918
newdata.gauranteed
doesn't seem to be working for bam()
models with discrete = TRUE
. You could email the author and maintainer of mgcv and send him the reproducible example so he can take a look. See ?bug.reports.mgcv
.
You probably want
names(newdat) <- "x"
as data frames have names
.
But the workaround is just to pass in something for random.x
newdat <- data.frame(x = runif(200, 0, 1), random.x = random.x[[1]])
and then do your call to generate test3
and it will work.
The warning message and error are the result of you not specifying random.x
in the newdata and then mgcv looking for random.x
and finding it in the global environment. You should really gather that variables into a data frame and use the data
argument when you are fitting your models, and try not to leave similarly named objects lying around in your global environment.
Upvotes: 2