cylurian
cylurian

Reputation: 11

SAS Remove Outliers

I'm looking for a macro or something in SAS that can help me in isolating the outliers from a dataset. I define an outlier as: Upperbound: Q3+1.5(IQR) Lowerbound: Q1-1.5(IQR). I have the following SAS code:

title 'Fall 2015';
proc univariate data = fall2015 freq;
var enrollment_count;
histogram enrollment_count / vscale = percent vaxis = 0 to 50 by 5 midpoints = 0 to 300 by 5;
inset n mean std max min range / position = ne;
run;

I would like to get rid of the outliers from fall2015 dataset. I found some macros, but no luck in working the macro. Several have a class variable which I don't have. Any ideas how to separate my data?

Upvotes: 0

Views: 4114

Answers (1)

Reeza
Reeza

Reputation: 21294

Here's a macro I wrote a while ago to do this, under slightly different rules. I've modified it to meet your criteria (1.5).

  1. Use proc means to calculate Q1/Q3 and IQR (QRANGE)
  2. Build Macro to cap based on rules
  3. Call macro using call execute and boundaries set, using the output from step 1.

    *Calculate IQR and first/third quartiles;
    proc means data=sashelp.class stackods n qrange p25 p75;
    var weight height;
    ods output summary=ranges;
    run;
    
    *create data with outliers to check;
    data capped; 
        set sashelp.class;
        if name='Alfred' then weight=220;
        if name='Jane' then height=-30;
    run;
    
    *macro to cap outliers;
    
    %macro cap(dset=,var=, lower=, upper=);
    
    data &dset;
        set &dset;
        if &var>&upper then &var=&upper;
        if &var<&lower then &var=&lower;
    run;
    
    %mend;
    
    
    *create cutoffs and execute macro for each variable;
    data cutoffs;
    set ranges;
    lower=p25-1.5*qrange;
    upper=p75+1.5*qrange;
    string = catt('%cap(dset=capped, var=', variable, ", lower=", lower, ", upper=", upper ,");");
    call execute(string);
    run;
    

Upvotes: 0

Related Questions