Reputation: 5385
How does R treat categorical/factor variable in a regression setting? Does is just perfom (some kind of) one-hot encoding or..?
Upvotes: 0
Views: 73
Reputation: 747
If there are k
levels for the categorical variable, it automatically treats it as k-1
dummy binary variables, indicating levels 1
to k-1
of the variable.
This is due to multicollinearity: should we decompose it to k
dummy vars, then the sum of these vars in each row is exactly 1, which is the intercept value. If you choose to run a regression model without intercept (i.e lm(y~x-1)
, it'll decompose into k
variables.
Upvotes: 1