Using speedglm on a data frame with a deleted factor

Question

I am trying to use the speedglm package for R to estimate regression models. In general the results are the same as using base R's glm function, but speedglm delivers unexpected behavior when I completely remove a given factor level from a data.frame. For example, see the code below:

dat1 <- data.frame(y=rnorm(100), x1=gl(5, 20)) 
dat2 <- subset(dat1, x1!=1)

glm("y ~ x1", dat2, family="gaussian")
Coefficients:
(Intercept)          x13          x14          x15  
    -0.2497       0.6268       0.3900       0.2811 

speedglm(as.formula("y ~ x1"), dat2)
Coefficients:
(Intercept)          x12          x13          x14          x15  
    0.03145     -0.28114      0.34563      0.10887           NA

Here the two functions deliver different results because factor level x1==1 has been deleted from dat2. Had I used dat1 instead the results would have been identical. Is there a way to make speedglm act like glm when processing data like dat2?

sckott · Accepted Answer

Droplevels I think is the key.

str(droplevels(dat2)) vs. str(dat2) - even though x1==1 is dropped it's still listed in the factor levels

So speedglm(as.formula("y ~ x1"), droplevels(dat2)) should equal glm("y ~ x1", dat2, family="gaussian")

Using speedglm on a data frame with a deleted factor

Answers (2)

Related Questions