majom
majom

Reputation: 8031

Incorrect estimates of lm alternatives

Dirk Eddelbuettel provides alternatives to estimating a linear regression with the lm command. See: http://dirk.eddelbuettel.com/blog/2011/07/05/

However, he mentions:

"Strictly-speaking, it is the only one we can compare to lm.fit() which also uses a pivoting scheme. In the case of a degenerated model matrix, all the other methods, including the four fastest approaches, are susceptible to producing incorrect estimates."

Could someone illustrate this point by providing an example when the estimates are correct for lm, but not for the alternative approaches?

Upvotes: 1

Views: 106

Answers (1)

Dirk is no longer here
Dirk is no longer here

Reputation: 368271

Install RcppArmadillo or RcppEigen and look at help(fastLm):

   ## case where fastLm breaks down
   dd <- data.frame(f1 = gl(4, 6, labels = LETTERS[1:4]),
                    f2 = gl(3, 2, labels = letters[1:3]))[-(7:8), ]
   xtabs(~ f2 + f1, dd)     # one missing cell
   mm <- model.matrix(~ f1 * f2, dd)
   kappa(mm)                # large, indicating rank deficiency
   set.seed(1)
   dd$y <- mm %*% seq_len(ncol(mm)) + rnorm(nrow(mm), sd = 0.1)
   summary(lm(y ~ f1 * f2, dd))     # detects rank deficiency
   summary(fastLm(y ~ f1 * f2, dd)) # some huge coefficients

We owe this example to Doug Bates.

Upvotes: 3

Related Questions