efz
efz

Reputation: 435

Remove records based on conditions in SAS

I am completely new to SAS and I need to do a simple task. say I have a table like:

![enter image description here

I need to remove all instances of sampCode for which at least one SampCode reflect the condition: 2*res > max. that is, not just removing the row, but the entire sampCode group based on that condition. thanks for helping!

Upvotes: 1

Views: 665

Answers (3)

Tom
Tom

Reputation: 51566

To test if a boolean expression is ever true over a series of values just take the MAX(). SAS evaluates boolean expressions to 1 (true) or 0 (false). So the MAX() is true if any value is true. In your case you want to keep the observations where the condition is never met, so when the MAX() is false.

PROC SQL will happily remerge summary statistics onto all of the detailed observations.

select * 
  from have
  group by sampcode
  having not max(2*res > max)
;

Upvotes: 1

Kermit
Kermit

Reputation: 3117

Try using a double DoW Loop.
It assumes your data set is sorted by sampCode.

data want;
    _keep=1;
    
    do _n_=1 by 1 until (last.sampCode);
        set have;
        by sampCode;
        if 2*res>max then _keep=0;
    end;
    
    do _n_=1 by 1 until (last.sampCode);
        set have;
        by sampCode;
        if _keep=1 then output;
    end;
    
    drop _keep;
run;

As a result, 2014_AT14012534-001 and 2014_AT14044069-001 are removed.

       sampCode        res   max
 2014_AT14036758-001  0.01  0.066
 2014_AT14051994-001  0.01  0.021
 2014_BE2549-14-0021  0.01   0.33
 2014_BE2549-14-0023  0.01   0.06
 2014_BE3013-14-0118  0.01  0.044
 2014_BE3259-14-0019  0.01    0.1
 2014_BE3259-14-0101  0.01  0.037
 2014_BE3320-14-0200  0.01  0.038
 2014_BE3365-14-0005  0.01  0.021
 2014_BE4040-14-0548 0.005   0.11
 2014_BE4685-14-0018  0.01  0.054
 2014_BE4804-14-0057  0.01   0.18
 2014_BE4824-14-0007  0.01   0.03

For an SQL approach, try

proc sql;
    create table want as
    select sampCode, res, max 
    from (select sampCode, res, max, 
            max(case when 2*res>max then 1 else 0 end) as _max 
            from have 
            group by sampCode 
            having _max=0);
quit;

Upvotes: 1

Negdo
Negdo

Reputation: 532

You have to use double DOW, otherwise you only get one observation per sampCode, even in cases where there are numerous observations with the same sampCode.

  data have;
        informat sampCode $40.;
        input sampCode res max;
        datalines;
    2014_AT14012534-001 0.01 0.034
    2014_AT14012534-001 2 0.144
    2014_AT14012534-001 0.01 0.015
    2014_AT14012534-001 0.01 0.075
    2014_AT14012534-002 0.01 0.034
    2014_AT14012534-002 0.01 0.314
    2014_AT14012534-003 0.01 0.034
    2014_AT14012534-003 0.02 0.934
    2014_AT14012534-003 0.01 0.034
    2014_AT14012534-004 3 0.001
    run;



data want;
    keep=1;
    
    do until (last.sampCode);
        set have;
        by sampCode;
        if 2*res>max then keep=0;
    end;
    do until (last.sampCode);
        set have;
        by sampCode;
        if keep=1 then output;
    end;
    
    drop keep;
run;

Upvotes: 0

Related Questions