Reputation: 387
Fitting the same model with bam
using methods "fREML" and "REML" gave me close results, but the deviance explained is rather different as returned by summary.gam
.
With "fREML" the quantity is ~3.5% (not good) while with "REML" it is ~50% (not that bad). How can it be possible? Which one is correct?
Unfortunately, I cannot provide a simple reproducible example.
#######################################
## method = "fREML", discrete = TRUE ##
#######################################
Family: binomial
Link function: logit
Formula:
ObsOrRand ~ s(Var1, k = 3) + s(RandomVar, bs = "re")
Parametric coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -5.0026 0.2199 -22.75 <2e-16
Approximate significance of smooth terms:
edf Ref.df Chi.sq p-value
s(Var1) 1.00 1.001 17.54 2.82e-05
s(RandomVar) 16.39 19.000 145.03 < 2e-16
R-sq.(adj) = 0.00349 Deviance explained = 3.57%
fREML = 2.8927e+05 Scale est. = 1 n = 312515
########################################
## method = "fREML", discrete = FALSE ##
########################################
Family: binomial
Link function: logit
Formula:
ObsOrRand ~ s(Var1, k = 3) + s(RandomVar, bs = "re")
Parametric coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -4.8941 0.2207 -22.18 <2e-16
Approximate significance of smooth terms:
edf Ref.df Chi.sq p-value
s(Var1) 1.008 1.016 17.44 3.09e-05
s(RandomVar) 16.390 19.000 144.86 < 2e-16
R-sq.(adj) = 0.00349 Deviance explained = 3.57%
fREML = 3.1556e+05 Scale est. = 1 n = 312515
#####################################################
## method = "REML", discrete method not applicable ##
#####################################################
Family: binomial
Link function: logit
Formula:
ObsOrRand ~ s(Var1, k = 3) + s(RandomVar, bs = "re")
Parametric coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -4.8928 0.2205 -22.19 <2e-16
Approximate significance of smooth terms:
edf Ref.df Chi.sq p-value
s(Var1) 1.156 1.278 16.57 8.53e-05
s(RandomVar) 16.379 19.000 142.60 < 2e-16
R-sq.(adj) = 0.0035 Deviance explained = 50.8%
-REML = 3.1555e+05 Scale est. = 1 n = 312515
Upvotes: 4
Views: 1964
Reputation: 73265
This issue can be back tracked to mgcv_1.8-23
. Its changlog for reads:
* bam extended family extension had introduced a bug in null deviance
computation for Gaussian additive case when using methods other than fREML
or GCV.Cp. Fixed.
It now turns out that the patch is successful for Gaussian case, but not for non-Gaussian cases.
Let me first provide a reproducible example, as your question does not have one.
set.seed(0)
x <- runif(1000)
## the linear predictor is a 3rd degree polynomial
p <- binomial()$linkinv(0.5 + poly(x, 3) %*% rnorm(3) * 20)
## p is well spread out on (0, 1); check `hist(p)`
y <- rbinom(1000, 1, p)
library(mgcv)
#Loading required package: nlme
#This is mgcv 1.8-24. For overview type 'help("mgcv-package")'.
fREML <- bam(y ~ s(x, bs = 'cr', k = 8), family = binomial(), method = "fREML")
REML <- bam(y ~ s(x, bs = 'cr', k = 8), family = binomial(), method = "REML")
GCV <- bam(y ~ s(x, bs = 'cr', k = 8), family = binomial(), method = "GCV.Cp")
## explained.deviance = (null.deviance - deviance) / null.deviance
## so in this example we get negative explained deviance for "REML" method
unlist(REML[c("null.deviance", "deviance")])
#null.deviance deviance
# 181.7107 1107.5241
unlist(fREML[c("null.deviance", "deviance")])
#null.deviance deviance
# 1357.936 1107.524
unlist(GCV[c("null.deviance", "deviance")])
#null.deviance deviance
# 1357.936 1108.108
Null deviance can't be smaller than deviance (TSS can not be smaller than RSS), so "REML" method of bam
fails to return the correct Null deviance here.
I have located the problem at line 1350 of mgcv_1.8-24/R/bam.r
:
object$family <- object$fitted.values <- NULL
Actually, it should be
object$null.deviance <- object$fitted.values <- NULL
For methods other than "GCV.Cp" and "fREML", bam
relies on gam
for estimation, after reducing the large n x p
model matrix to a p x p
matrix (n
: number of data; p
: number of coefficients). Since this new model matrix does not have a natural interpretation, many quantities returned by gam
should be invalidated (apart from estimated smooth parameters). It was a typo for Simon to put family
.
I build a patched version and it turns out to fix the bug. I will tell Simon to get it fixed in the next release.
Upvotes: 5