Reputation: 1371
Problem: I intend to fit a linear model with interaction terms. After estimating a "full" model, I want to remove the not significant interaction terms. However, after using the function update(lm(),.~.-interaction) on my model, nothing happens. Please help.
Data:
library(car)
data(Prestige)
Prestige_compl <- Prestige[complete.cases(Prestige),] #rm NA's
attach(Prestige_compl)
Model:
modR0 <- lm(prestige ~
income +
education +
women +
income * type +
education * type +
women * type ,
data = Prestige_compl)
# fit a linear model with interaction terms
summary.lm(modR0)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -5.822e+00 7.311e+00 -0.796 0.42803
income 4.692e-03 6.691e-04 7.013 5.00e-10 ***
education 1.625e+00 9.163e-01 1.773 0.07971 .
women 1.343e-01 4.656e-02 2.885 0.00494 **
typeprof 2.436e+01 1.351e+01 1.803 0.07496 .
typewc -2.178e+01 1.727e+01 -1.261 0.21081
income:typeprof -4.144e-03 7.132e-04 -5.810 1.03e-07 ***
income:typewc -7.527e-04 1.814e-03 -0.415 0.67924
education:typeprof 1.512e+00 1.235e+00 1.224 0.22423
education:typewc 2.123e+00 2.190e+00 0.970 0.33491
women:typeprof -1.601e-01 6.506e-02 -2.460 0.01588 *
women:typewc 2.893e-02 1.117e-01 0.259 0.79619
Rm not significant interaction terms:
modR1 <- update(modR0, .~. -women:typewc)
summary.lm(modR1)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -5.822e+00 7.311e+00 -0.796 0.42803
income 4.692e-03 6.691e-04 7.013 5.00e-10 ***
education 1.625e+00 9.163e-01 1.773 0.07971 .
women 1.343e-01 4.656e-02 2.885 0.00494 **
typeprof 2.436e+01 1.351e+01 1.803 0.07496 .
typewc -2.178e+01 1.727e+01 -1.261 0.21081
income:typeprof -4.144e-03 7.132e-04 -5.810 1.03e-07 ***
income:typewc -7.527e-04 1.814e-03 -0.415 0.67924
education:typeprof 1.512e+00 1.235e+00 1.224 0.22423
education:typewc 2.123e+00 2.190e+00 0.970 0.33491
women:typeprof -1.601e-01 6.506e-02 -2.460 0.01588 *
women:typewc 2.893e-02 1.117e-01 0.259 0.79619
Why is the interaction term that should be removed still there?
Upvotes: 4
Views: 3379
Reputation: 3462
The women:typewc is not the actual interaction term. the real interaction is women:type which breaks down to 2 coefficients because type is a factor with 3 categories. remember that the meaning of the dummy variable (even in interactions) is always a difference between the category and the default category. removing only one category is likely to change the effect of the "default" category - so basically there is no consistent way to interpret an interaction (or a dummy variable) if you remove only few categories from your model. you should either remove all categories or keep all of them.
if you use
modR1 <- update(modR0, .~. -women:type)
the interaction term will be removed with all its categories. note however that some of the coefficients were, in fact, statistically significant
Upvotes: 4