How to strip down the glm model?

Question

The object returned by glm contains residuals, fitted values, effects, qr$qr, linear.predictors, weights &c &c which add up to a humongous object (if the input is big enough).

How do I strip it down so that something like predict will still work?

Ideally, I want a function which would return a small function object equivalent to function(x) predict(my_model,data.frame(x=x)); something like as.stepfun for isoreg.

Noah · Accepted Answer

Most of the model components are descriptive, and are not necessary for predict to work. A helper function (HT: R-Bloggers) can be used to remove the fat:

stripGlmLR = function(cm) {
  cm$y = c()
  cm$model = c()

  cm$residuals = c()
  cm$fitted.values = c()
  cm$effects = c()
  cm$qr$qr = c()  
  cm$linear.predictors = c()
  cm$weights = c()
  cm$prior.weights = c()
  cm$data = c()


  cm$family$variance = c()
  cm$family$dev.resids = c()
  cm$family$aic = c()
  cm$family$validmu = c()
  cm$family$simulate = c()
  attr(cm$terms,".Environment") = c()
  attr(cm$formula,".Environment") = c()

  cm
}

Now you can apply it to your model for a 5+ order-of-magnitude reduction in size (in this example):

traindata <- data.frame(x = rnorm(1e6), y = rnorm(1e6))
testdata <- data.frame(x = rnorm(10))

mod1 <- glm(y~x, data= traindata)
mod2 <- stripGlmLR(mod1)

format(object.size(mod1), units = "Kb")
# [1] "492234.5 Kb"
format(object.size(mod2), units = "Kb")
# [1] "18.5 Kb"

all(predict(object = mod1, newdata = testdata) == 
    predict(object = mod2, newdata = testdata))
# [1] TRUE

Note that if you want to be able to use the full suite of glm methods, you will need to retain other components of the model.

How to strip down the glm model?

Answers (1)

Related Questions