Reputation: 421
I have some extreme outliers throwing my regression model off, and I removed them using If-Then-Else statements. However, SAS eliminated those data points completely and found new outliers in the ones remaining. Is there a way to remove the outliers from analysis without it throwing more into the mix?
I calculated Q3 + 1.5 * IQR and used that value as so:
Data lungcancer; input trt surv age sex @@;
/* create a new variable diff */
diff = surv - 365;
/* create a new categorical variable resp */
If diff > 0 then resp= 1;
If diff <= 0 then resp= 0;
/* create a new categorical variable sev */
if 2276 > surv >= 1621 then sev=0;
Else If 456 <= surv <= 1620 then sev=1;
Else if 181 <= surv <= 455 then sev=2;
Else if 1 <= surv <= 180 then sev=3;
Else if surv > 2276 then delete; /* Remove outliers */
Upvotes: 0
Views: 461
Reputation: 63424
So, you removed some data points that were on the edge of your data, and then got a new set of data, and recalculated IQR, and ... are surprised that there are new "outliers"?
This isn't SAS doing anything particular, it's doing what it's asked, identifying things in 1.5*IQR. Outlier removal is always up to you (when you're doing things this way, anyway, and not using one of the more advanced procs I suppose): you decide what's an outlier and remove it or not, depending on your data. So - do you think these new data points are outliers? Remove or not depending on that.
Upvotes: 0