Incorrect estimates of lm alternatives

Question

Dirk Eddelbuettel provides alternatives to estimating a linear regression with the lm command. See: http://dirk.eddelbuettel.com/blog/2011/07/05/

However, he mentions:

"Strictly-speaking, it is the only one we can compare to lm.fit() which also uses a pivoting scheme. In the case of a degenerated model matrix, all the other methods, including the four fastest approaches, are susceptible to producing incorrect estimates."

Could someone illustrate this point by providing an example when the estimates are correct for lm, but not for the alternative approaches?

Dirk is no longer here · Accepted Answer

Install RcppArmadillo or RcppEigen and look at help(fastLm):

   ## case where fastLm breaks down
   dd <- data.frame(f1 = gl(4, 6, labels = LETTERS[1:4]),
                    f2 = gl(3, 2, labels = letters[1:3]))[-(7:8), ]
   xtabs(~ f2 + f1, dd)     # one missing cell
   mm <- model.matrix(~ f1 * f2, dd)
   kappa(mm)                # large, indicating rank deficiency
   set.seed(1)
   dd$y <- mm %*% seq_len(ncol(mm)) + rnorm(nrow(mm), sd = 0.1)
   summary(lm(y ~ f1 * f2, dd))     # detects rank deficiency
   summary(fastLm(y ~ f1 * f2, dd)) # some huge coefficients

We owe this example to Doug Bates.

Incorrect estimates of lm alternatives

Answers (1)

Related Questions