Nirmal Roy
Nirmal Roy

Reputation: 61

Multicollinearity of categorical variables

What are the different measures available to check for multicollinearity if the data contains both categorical and continuous independent variables?

Can I use VIF by converting categorical variables into dummy variables ? Is there a fundamental flaw in this since I could not locate any reference material on the internet ?

Upvotes: 5

Views: 8895

Answers (1)

aerin
aerin

Reputation: 22694

Can I use VIF by converting categorical variables into dummy variables ?

Yes, you can. There is no fundamental flaw in this approach.

if the data contains both categorical and continuous independent variables?

Multicollinearity doesn’t care if it’s a categorical variable or an integer variable. There is nothing special about categorical variables. Convert your categorical variables into binary, and treat them as all other variables.

I assume your concern would be categorical variables must be correlated to each other and it's a valid concern. Suppose the case when the proportion of cases in the reference category is small. Let's say there are 3 categorical variables: Overweight, normal, underweight. We can turn this into 2 categorical variable. Then, if one category's data is very small (like normal people are 5 out of 100 and all other 95 people are underweight or overweight), the indicator variables will necessarily have high VIFs, even if the categorical variable is not associated with other variables in the regression model.

What are the different measures available to check for multicollinearity

One way to detect multicollinearity is to take the correlation matrix of your data, and check the eigen values of the correlation matrix.

Eigen values close to 0 indicate the data are correlated.

Upvotes: 0

Related Questions