Reputation: 933
Creating a design matrix for linear models gave me an output that I didn't understand. Say I want to add two grouping variables:
model.matrix(~ factor(c(0,0,0,0,1,1)) + factor(c(0,0,1,1,0,0)))
This creates a three column design where the first is the intercept. But when I supress the intercept:
model.matrix(~ 0 + factor(c(0,0,0,0,1,1)) + factor(c(0,0,1,1,0,0)))
now again three columns are created, only the first and second are 0 and 1 versions of the same variable.
Why does this happen?
Upvotes: 4
Views: 3845
Reputation: 8252
When you put a factor in model.matrix
it includes dummies for all the levels it can; when you have a two-level factor, you have dummies (indicators) for both levels.
With no intercept, the first factor can include all its levels, so it does, but when there's an intercept, that would lead to perfect multicollinearity (the sum of all the indicators for a factor is constant), and by default the indicator for the first level of the factor is then omitted.
With the second factor, it can't include all its levels in either case, because there will be multicollinearity with either the intercept (when it's in the model) or the first factor (when it isn't).
Upvotes: 1