Reputation: 1673
I'm interested in particular effects (or parameters) of a linear model. For example, taking the iris dataset, I would be interested in:
Sepal.Width
modulates the effect of versicolor and virginica (compared to the reference level setosa).Sepal.Width
and Sepal.Length
in each of the three levels of species
.The only way I found to do it is to fit two models, one with interaction and the other with nested interaction:
fit <- lm(Sepal.Length ~ Species * Sepal.Width, data=iris)
summary(fit) # The two last lines
fit <- lm(Sepal.Length ~ Species / Sepal.Width, data=iris)
summary(fit) # The three last lines
Nevertheless, fitting twice the same model seems relatively inefficient (especially when fitting models that take a long time to compute).
Is there any way to obtain, from a single model, the equivalent of the two last lines of the first model and three last lines of the second model? thanks
Upvotes: 2
Views: 188
Reputation: 1651
If you are interested in the coefficient estimates from the summary, then from a single model, the equivalent of the two last lines of the first model and three last lines of the second model can be obtained through addition. Let me explain.
In reality, both summaries are giving you the same information, just in different ways. For the first summary, here is my output:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.6390 0.5715 4.618 8.53e-06 ***
Speciesversicolor 0.9007 0.7988 1.128 0.261
Speciesvirginica 1.2678 0.8162 1.553 0.123
Sepal.Width 0.6905 0.1657 4.166 5.31e-05 ***
Speciesversicolor:Sepal.Width 0.1746 0.2599 0.672 0.503
Speciesvirginica:Sepal.Width 0.2110 0.2558 0.825 0.411
This is the result from the fit when setosa is dropped from the model. This automatically occurs in R because when you pass in a factor to a linear model, you only need n-1
parts to describe it since the base-level part (in this case setosa) is entirely collinear with the other two factors.
In the second summary fit, this is what I see.
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.6390 0.5715 4.618 8.53e-06 ***
Speciesversicolor 0.9007 0.7988 1.128 0.261
Speciesvirginica 1.2678 0.8162 1.553 0.123
Speciessetosa:Sepal.Width 0.6905 0.1657 4.166 5.31e-05 ***
Speciesversicolor:Sepal.Width 0.8651 0.2002 4.321 2.88e-05 ***
Speciesvirginica:Sepal.Width 0.9015 0.1948 4.628 8.16e-06 ***
In this model, despite being entirely collinear, the base-level setosa is kept in the model. Normally, if one were to keep an entirely collinear variable in a model, R would return an error because it would complain that it can't invert the matrix to find the best-fit for the desired formula (this is because the matrix is not full-rank).
But R is okay with the model because, well, it returned it successfully. This means the matrix was invertible because, despite having all three levels of the factor in the interaction terms, it was still a full-rank matrix. Using this knowledge, I thought that the information of whatever was "missing" from the second model was just baked into the first model.
And that's exactly what I saw. Look at the following equations, with the coefficient from model 1 on the left side of the equation, and the coefficient from model 2 on the right side of the equation.
Sepal.Width = Speciessetosa:Sepal.Width
Sepal.Width + Speciesversicolo:Sepal.Width = Speciesversicolor:Sepal.Width
Sepal.Width + Speciesvirginica:Sepal.Width = Speciesvirginica:Sepal.Width
Therefore, to obtain the coefficient estimates from the last two lines of the first model and the last three lines of the second model, just simply take the summary of the first model, and then perform the three equations I wrote above to obtain the "extra" information stored in the last three lines.
Upvotes: 2