Reputation: 7468
I'm trying to do a very simple linear regression analysis on a few variables in my dataset, and finding that R and SAS are outputting very different values for its model fits. I am attempting to regress
spending ~ tenure (in months)
In SAS, my code looks like
proc reg data=model_data;
model spending = tenure;
output out=&outfile r=resid stdi=stdi_metric;
title 'SAS model';
run; quit;
In R, I am using the following code:
modelobject <- lm(spending ~ tenure, data = df)
predictions <- predict(modelobject, interval = "prediction", se.fit = TRUE, level = 1 - alpha)
However, what I see is that the residuals in R (and therefore the fitted coefficient and intercept terms) are very different than in SAS. I am not including them here since it's confidential data, but suffice to say they don't match. They DO match, though, when I change my SAS code to
proc reg data=model_data;
model spending = tenure;
output out=&outfile r=resid stdp=stdp_metric; * <-- this is the only change!
title 'SAS model';
run; quit;
I get the same residuals and coefficients here. Why is this the case? From my understanding, stdp and stdi are the standard errors associated with confidence and prediction intervals (see these lecture notes). However, switching between a confidence and prediction interval shouldn't theoretically change your model's fit (this is especially true in R since you're passing in the same modelobject
into your predict()
function).
So why do the SAS residuals change when the stdi
metric is switched to stdp
? Moreover, this question is being asked in the broader context of a project where I am attempting to convert old SAS macros into R- how can I replicate the same model fit in R (with SAS' PROC REG
using stdi
)?
I have also consulted the SAS manuals on definitions of these metrics and PROC REG, and cannot find anything regarding why model fit implementation changes when stdi
is changed to stdp
.
Upvotes: 2
Views: 725
Reputation: 7468
Figured out what the issue was for me. You actually have to scroll down in the regression output window, because the latest results are further down the window. Good rule of thumb in SAS I learned- always check if there is additional output and if you are looking at the latest results. This, combined with the fact that I had a syntax error in my macro parameters that led to me fit two y targets at the same time, was causing my error:
Upvotes: 0
Reputation: 11955
STDI
is the standard error of the individual predicted value whereas STDP
is standard error of the mean predicted value.
So in order to resolve this issue you need to use se.fit=F
in predict()
function and you should get the exactly similar result as you are getting from your SAS code which uses STDI
option (currently in your R code se.fit = TRUE so it's using standard error of predicted means while predicting the outcome which is equivalent to STDP
option in SAS). Hope this helps!
Don't forget to let us know if it solved your problem :)
Upvotes: 1