jay.sf
jay.sf

Reputation: 73342

How to predict after lm.wfit with new data?

lm.wfit (and .lm.fit) has considerable speed advantages over lm if we can do without the additional calculations of the latter.

fo <- mpg ~ hp
X <- model.matrix(fo, mtcars)

microbenchmark::microbenchmark(flm=lm <- 
                                 lm(fo, mtcars[mtcars$am == 1, ], 
                                    weights=rep(1, 13)),
                               flmw=lmw <- 
                                 lm.wfit(X[mtcars$am == 1, ], mtcars$mpg[mtcars$am == 1], 
                                         w=rep(1, 13)))
# Unit: microseconds
# expr     min       lq     mean   median       uq      max neval cld
#  flm 935.328 951.2735 994.7719 957.2695 1006.128 1333.339   100   b
# flmw  56.895  60.3400  66.2028  64.2940   66.463  186.250   100  a 
stopifnot(lm$coefficients == lmw$coefficients)

It's often gone well, but now I need to apply predict with the whole data, similar to this example.

predict(lm, mtcars)
# [...]
predict(lmw, mtcars)
# Error in UseMethod("predict") : 
#   no applicable method for 'predict' applied to an object of class "list"

Is that even possible after using this confined lm.wfit? If so, how?

Upvotes: 2

Views: 117

Answers (2)

Nathan Werth
Nathan Werth

Reputation: 5273

lm is slower than lm.fit and lm.wfit because it uses those functions internally. You've pulled some other internal work out by creating the model matrix outside the benchmark. That's fine if you expect to reuse the model matrix, but otherwise it's a misleading benchmark.

As for predict, it's a generic method. Because there's no predict.list method, it fails. If you want, you can write a method for a custom class, then assign that class to the returned value of lm.wfit.

E.g.:

my_lm_wfit <- function(...) {
  fit <- lm.wfit(...)
  class(fit) <- "my_lm_wfit"
  fit
}

predict.my_lm_wfit <- function(...) {
  # Do something...
}

Upvotes: 2

MrFlick
MrFlick

Reputation: 206411

Well, the help page for ?lm.wfit does warn against its use and that's probably for this exact reason. It doesn't store any of the information about formulas and column names that predict() normally uses to make sure your new data matches the variables types of the old data.

Instead, you could do some of the matrix multiplication yourself if you go through the same model.matrix() procedure.

model.matrix(fo, mtcars) %*% lmw$coefficients

You can see these are the same as the base predict with

all((model.matrix(fo, mtcars) %*% lmw$coefficients) == predict(lm, mtcars))
# [1] TRUE

Upvotes: 3

Related Questions