Reputation: 1005
This is really weird. I calculate R^2 values with Excel in two different ways and the results differ hugely. Why?
1) First I use Excel to do a linear regression via a graph, and use the "Add Trendline..." right mouse button functionality to specify Intercept = 0. The R square value shows -3.253. The regressed equation is Y = -0.1321 * X
2) Then I use Excel to do a linear regression via LINEST function. I highlight 5x2 rows and in the top left cell, I type "=LINEST ([Y vector]; [X vector], FALSE, TRUE). The False means the intercept is 0, and the True means Excel should print additional regression statistical information. Then I press CTRL + SHIFT + Enter. This will show me additional statistics, such as R^2 value in the third left cell. Which turns out to be 0.11166. The regressed equation is Y = -0.1321 * X
My question is; what am I doing wrong in calculating R^2 with the graph? Python and statsmodels.api confirms that R^2 is 0.11166, and the regressed equation is Y = -0.1321 * X.
Y =
0.0291970802919708
0.141801551718973
0.145668034655723
0.0691229530946433
0.0431577486597426
0.133618351873374
X =
-0.35551988
-0.20577599
0.10780785
-0.25028796
-0.42762184
0.02442197
Upvotes: 3
Views: 3404
Reputation: 1
Me and my fellow engineers just got tangled up in this. Based on this discussion and what we observed, the R^2 is wrong all of the time except when Excel calculates the best y-intercept. Any other y-intercept (either forced through Zero OR user-defined), is wrong.
Upvotes: 0
Reputation: 334
Your calculation is correct. Scatter plot does not return correct R^2 when the intercept is 0. This is an formula fo R^2
where
If you use standard regression model, you use average value of y as y̅. But when you assume that the intercept equals 0, you need to set y̅ as zero. If you use the average value of y instead of zero, you get the R^2 = -3.252767.
You can see the calculation here. The SStot wrong column uses average value of y as y̅. Then the R^2 value equals to -3.252767. If you use 0 (as I did in SStot right column), then you get 0.111.
It is an old bug described by Microsoft here:https://support.microsoft.com/en-us/help/829249/you-will-receive-an-incorrect-r-squared-value-in-the-chart-tool-in-excel-2003 You need to use the LINEST function to get correct R^2 value.
Upvotes: 3