menhadon
menhadon

Reputation: 11

Excel Correlation R^2 decreases with more factors included

I use Excel to run regression between sales and a bunch of variables. I also always set the y-intercept of linear equation to 0. When I calculate correlation r^2 with more variables, r^2 drops by a lot. This is very strange because r^2 should increase with more variables included.
I am pretty sure I have been calculated regular r^2, not adjusted r^2. If I didn't set y intercept to 0, r^2 would increase with more variables. I also realized that there are some functionality issues with Excel with y intercept is set to 0, so I have been calculating R^2 manually. But the equations I am using should be correct. Does anyone know why R^2 drops with more factors? Is it because i have set y-intercept to 0?

Thank you so much! Jessica

Upvotes: 1

Views: 45

Answers (1)

Vash
Vash

Reputation: 1787

Your R-squared may be dropping because of excluding the intercept term which is forcing the fitted line to pass through the origin(if we were looking at a one predictor example in 2D). IMO, you should include the intercept term even though it doesn't add logical meaning to your model.

This link explains it well.

The constant term is in part estimated by the omission of predictors from a regression analysis. In essence, it serves as a garbage bin for any bias that is not accounted for by the terms in the model. You can picture this by imagining that the regression line floats up and down (by adjusting the constant) to a point where the mean of the residuals is zero, which is a key assumption for residual analysis. This floating is not based on what makes sense for the constant, but rather what works mathematically to produce that zero mean.

So by forcing your line to pass through the origin, you may be interfering with its prediction power and hence the R-squared.

Upvotes: 1

Related Questions