duckman
duckman

Reputation: 747

proc rank vs proc means to remove the top and bottom 0.1%

I would like to remove some outlier in the top and bottom 0.1%. PROC MEANS has the p99 option which only helps to remove the top 1%, not 0.1%. Is there another way to do so? I thought of PROC RANK but not sure if it would give the same result. my code is:

    proc means data=input noprint; by date; output out=trunc(drop=_FREQ_ _TYPE_) p99(var1)=p99_var1 p99(var2)=p99_var2; run;
data input; merge input trunc; by date;
    if var1 < p99_var1 and var2<p99_var2;run;

    versus 

    proc rank data=input out=input percent;
        by date;
         var var1 var2;
         ranks percentile1 percentile2;
    run;
data input; set input; 
where 0.001<percentile1<0.999 and 0.001<percentile2<0.999;run

I am aware that in the first method I use 99% (because I don't know how to do 99.9% with this method) but I use 99.9% in the second method. If I use 99% for the second method, which one would be a better way to do? and would the 2 yield the same result?

Upvotes: 1

Views: 562

Answers (2)

Nissar Ahmed
Nissar Ahmed

Reputation: 49

Using the ties treatment and fractions options of proc rank you should have the flexibility you need for this problem.

Check the SAS documentation here.

Upvotes: -1

Longfish
Longfish

Reputation: 7602

proc means only has access to certain default percentiles, however you can specify custom percentiles in proc univariate

proc univariate data=sashelp.prdsal3 noprint;
   var actual;
   output out=want pctlpre=P_ pctlpts=0.1,99.9;
run;

Upvotes: 2

Related Questions