Reputation: 435
I am completely new to SAS and I need to do a simple task. say I have a table like:
I need to remove all instances of sampCode
for which at least one SampCode
reflect the condition:
2*res > max
. that is, not just removing the row, but the entire sampCode
group based on that condition.
thanks for helping!
Upvotes: 1
Views: 665
Reputation: 51566
To test if a boolean expression is ever true over a series of values just take the MAX(). SAS evaluates boolean expressions to 1 (true) or 0 (false). So the MAX() is true if any value is true. In your case you want to keep the observations where the condition is never met, so when the MAX() is false.
PROC SQL will happily remerge summary statistics onto all of the detailed observations.
select *
from have
group by sampcode
having not max(2*res > max)
;
Upvotes: 1
Reputation: 3117
Try using a double DoW Loop.
It assumes your data set is sorted by sampCode
.
data want;
_keep=1;
do _n_=1 by 1 until (last.sampCode);
set have;
by sampCode;
if 2*res>max then _keep=0;
end;
do _n_=1 by 1 until (last.sampCode);
set have;
by sampCode;
if _keep=1 then output;
end;
drop _keep;
run;
As a result, 2014_AT14012534-001
and 2014_AT14044069-001
are removed.
sampCode res max
2014_AT14036758-001 0.01 0.066
2014_AT14051994-001 0.01 0.021
2014_BE2549-14-0021 0.01 0.33
2014_BE2549-14-0023 0.01 0.06
2014_BE3013-14-0118 0.01 0.044
2014_BE3259-14-0019 0.01 0.1
2014_BE3259-14-0101 0.01 0.037
2014_BE3320-14-0200 0.01 0.038
2014_BE3365-14-0005 0.01 0.021
2014_BE4040-14-0548 0.005 0.11
2014_BE4685-14-0018 0.01 0.054
2014_BE4804-14-0057 0.01 0.18
2014_BE4824-14-0007 0.01 0.03
For an SQL approach, try
proc sql;
create table want as
select sampCode, res, max
from (select sampCode, res, max,
max(case when 2*res>max then 1 else 0 end) as _max
from have
group by sampCode
having _max=0);
quit;
Upvotes: 1
Reputation: 532
You have to use double DOW, otherwise you only get one observation per sampCode, even in cases where there are numerous observations with the same sampCode.
data have;
informat sampCode $40.;
input sampCode res max;
datalines;
2014_AT14012534-001 0.01 0.034
2014_AT14012534-001 2 0.144
2014_AT14012534-001 0.01 0.015
2014_AT14012534-001 0.01 0.075
2014_AT14012534-002 0.01 0.034
2014_AT14012534-002 0.01 0.314
2014_AT14012534-003 0.01 0.034
2014_AT14012534-003 0.02 0.934
2014_AT14012534-003 0.01 0.034
2014_AT14012534-004 3 0.001
run;
data want;
keep=1;
do until (last.sampCode);
set have;
by sampCode;
if 2*res>max then keep=0;
end;
do until (last.sampCode);
set have;
by sampCode;
if keep=1 then output;
end;
drop keep;
run;
Upvotes: 0