Reputation: 61
What are the different measures available to check for multicollinearity if the data contains both categorical and continuous independent variables?
Can I use VIF by converting categorical variables into dummy variables ? Is there a fundamental flaw in this since I could not locate any reference material on the internet ?
Upvotes: 5
Views: 8895
Reputation: 22694
Can I use VIF by converting categorical variables into dummy variables ?
Yes, you can. There is no fundamental flaw in this approach.
if the data contains both categorical and continuous independent variables?
Multicollinearity doesn’t care if it’s a categorical variable or an integer variable. There is nothing special about categorical variables. Convert your categorical variables into binary, and treat them as all other variables.
I assume your concern would be categorical variables must be correlated to each other and it's a valid concern. Suppose the case when the proportion of cases in the reference category is small. Let's say there are 3 categorical variables: Overweight, normal, underweight. We can turn this into 2 categorical variable. Then, if one category's data is very small (like normal people are 5 out of 100 and all other 95 people are underweight or overweight), the indicator variables will necessarily have high VIFs, even if the categorical variable is not associated with other variables in the regression model.
What are the different measures available to check for multicollinearity
One way to detect multicollinearity is to take the correlation matrix of your data, and check the eigen values of the correlation matrix.
Eigen values close to 0 indicate the data are correlated.
Upvotes: 0