DDM
DDM

Reputation: 323

Target Dependent Variables is continuous but Independent Variables are Categorical

I am working on a dataset where my dependent variable is continuous but all my independent variables are categorical(non-binary). I have tried one hot encoding or created dummy variables. I am getting low R2 about 0.4 but high adjusted R2 around 0.9. However I am getting vertical lines in my regression plot and residual plot, even though my QQ line seems to fit into a straight line with some heavy tails at the end. So may I know if regression model is the right method to be used in this kind of scenario? If its a yes how should the plots be analyzed and if its a no, what are the other methods and libraries that can be employed to yield a better result?

enter image description hereenter image description here

Upvotes: 1

Views: 753

Answers (1)

StupidWolf
StupidWolf

Reputation: 46908

I try to address some of your questions below:

However I am getting vertical lines in my regression plot and residual plot

This is expected if all your independent variables (IV) are categorical. Each category is encoded as binary and the prediction for each observation would be combinations of each category. For simple illustration, imagine a prediction by 2 binary variables, there can only be 4 outcomes (0/0, 0/1, 1/0, 1/1).. and if you extend this to many binary variables, you see that kind of discrete prediction.

In other words, there is no slope to speak of so you should not see a continuous prediction. You can read more about regression with categories here

even though my QQ line seems to fit into a straight line with some heavy tails at the end. So may I know if regression model is the right method to be used in this kind of scenario?

Yes you can still use a linear model.

If its a yes how should the plots be analyzed and if its a no, what are the other methods and libraries that can be employed to yield a better result?

What you have is basically similar to an anova analysis except you are not doing inference. You can check for the homogeneity of variance using a levene test, or other similar test. These test can be extremely sensitive when you have a large number of observations. Looking at your qq plot , which looks at quantiles, I think its fine.

Upvotes: 1

Related Questions