Reputation: 33
I have a dataset where:
X1 - categorical independent variable
X2 - continuous independent variable
y - continuous dependent variable
And I'm looking to use X1 and X2 to predict y. Is linear regression appropriate for this (does it even make sense to regress over a categorical independent variable?)? If so, how can I use linear regression when X1 is a categorical independent variable (e.g. eye colour)?
Should I create a separate linear regression model for each of the categories in X1? Or try to create a multiple linear regression model?
Taking a look online there are mostly resources concerning continuous independent -> continuous dependent (linear regression), or continuous independent -> categorical dependent (logistic regression).
Would appreciate being pointed to any resources/tools that could help me.
Upvotes: 1
Views: 1219
Reputation: 24
You can use linear regression, but you first need to first encode X1 as a series of variables.
Here's a simple example, using the 'dummy coding' method:
┏━━━━━━━━━━━━┳━━━━━┳━━━━━┓
┃ Eye Colour ┃ x11 ┃ x12 ┃
┣━━━━━━━━━━━━╋━━━━━╋━━━━━┫
┃ Blue ┃ 0 ┃ 0 ┃
┣━━━━━━━━━━━━╋━━━━━╋━━━━━┫
┃ Brown ┃ 1 ┃ 0 ┃
┣━━━━━━━━━━━━╋━━━━━╋━━━━━┫
┃ Green ┃ 0 ┃ 1 ┃
┗━━━━━━━━━━━━┻━━━━━┻━━━━━┛
Here's an article that explains different coding methods:
Upvotes: 0