CutePoison
CutePoison

Reputation: 5385

How does R use categorical variables in regression

How does R treat categorical/factor variable in a regression setting? Does is just perfom (some kind of) one-hot encoding or..?

Upvotes: 0

Views: 73

Answers (1)

Spätzle
Spätzle

Reputation: 747

If there are k levels for the categorical variable, it automatically treats it as k-1 dummy binary variables, indicating levels 1 to k-1 of the variable.

This is due to multicollinearity: should we decompose it to kdummy vars, then the sum of these vars in each row is exactly 1, which is the intercept value. If you choose to run a regression model without intercept (i.e lm(y~x-1), it'll decompose into k variables.

Upvotes: 1

Related Questions