Reputation: 2929
I refer to this post http://r.789695.n4.nabble.com/Questions-about-biglm-td878929.html which discusses on how to obtain VIF using biglm.
Is there an alternative way of obtaining VIF from the object produced by biglm?
Thanks for your help
Upvotes: 2
Views: 3270
Reputation: 174813
For simple models, this is relatively easy following the code in the vif()
method of "lm"
objects in the car package, as John Fox suggested in the R-Help thread you linked to. You can't use the car package directly as it uses the model matrix and that isn't going to be possible with biglm()
. To illustrate how to do this, consider the simple example from ?biglm
require(biglm)
data(trees)
ff <- log(Volume) ~ log(Girth) + log(Height)
chunk1<-trees[1:10,]
chunk2<-trees[11:20,]
chunk3<-trees[21:31,]
a <- biglm(ff,chunk1)
a <- update(a,chunk2)
a <- update(a,chunk3)
The fitted model is in a
, from which we extract the variance-covariance matrix of the parameters, drop the intercept, compute the correlation matrix R
and its determinant:
v <- vcov(a)
## drop intercept
v <- v[-1, -1, drop = FALSE]
R <- cov2cor(v)
detR <- det(R)
Next, have something to hold the VIFs in
res <- numeric(length = ncol(v))
names(res) <- colnames(v)
Finally, loop over the model terms (minus intercept) and compute the VIF for each term
for(i in seq_len(ncol(v))) {
res[i] <- det(R[i, i, drop = FALSE]) * det(R[-i, -i, drop = FALSE]) / detR
}
This results in:
> res
log(Girth) log(Height)
1.391027 1.391027
If we load the car package and use it to compute VIFs for the same model fitted using lm()
, we can see that it gives the same output
> require(car)
> mod <- lm(ff, data = trees)
> vif(mod)
log(Girth) log(Height)
1.391027 1.391027
vif()
looks a bit cleverer than the code I show as it works out if model terms are included in more coefficients than just the one main effect that my code assumes. In such circumstances, a model covariate will be included in more than one column/row of variance-covariance matrix v
and you need to retain/exclude all row/columns containing the term when computing the determinants in the for()
loop. You can work this out from the variance-covariance matrix but you can figure that out yourself.
When testing this, fit your model to a small random sample of the data using both biglm()
and lm()
, and compute the VIFs using car's vif()
on the resulting "lm"
object and by hand on the "biglm"
object and check they concur.
Upvotes: 6