Reputation: 17107
I have a SAS dataset like this:
col1 col2 col3 col4 col5 col6
A1 B1 C1 D1 E1 $100
A1 B1 C1 D2 E2 $200
A2 B2 C2 D3 E3 $500
The first 3 columns are my key columns. I need to extract the row that has got the highest value for col6
.
So I can do:
proc sql;
create table temp as
select col1,col2,col3,max(col6) as col6
from dataset
group by 1,2,3;
select * from dataset t1
inner join temp t2
where t1.col1 = t2.col1 and t1.col2 = t2.col2
and t1.col3 = t2.col3 and t1.col6 = t2.col6;
quit;
But how can I achieve the same by making one pass through the data? Is there a way?
Upvotes: 0
Views: 2281
Reputation: 26
data col;
input (col1-col5)(:$2.) col6:comma.;
cards;
A1 B1 C1 D1 E1 $100
A1 B1 C1 D2 E2 $200
A2 B2 C2 D3 E3 $500
;;;;
run;
proc print;
run;
proc summary data=col;
output out=maxrow idgroup(max(col6) out(_all_)=);
run;
proc print;
run;
Upvotes: 0
Reputation: 1807
Your approach is perfectly fine for many uses. If using exactly one pass is actually essential, you could use a data step and a hash object. This reads every record once and updates the single row in the hash object every time it finds a row with a higher col6 than it's yet seen.
data _null_;
if 0 then set have; /*Make sure all vars used in the hash object are set up with the correct types*/
retain highest_so_far;
if _n_ = 1 then do;
highest_so_far = col6;
declare hash hi_row();
hi_row.definekey(co1,col2,col3,col4,col5,col6);
hi_row.definedone();
end;
set have end=eof;
if col6 > highest_so_far then do;
hi_row.clear();
hi_row.add();
highest_so_far = col6;
end;
if (eof) then hi_row.output(want);
run;
If there's a tie for highest, this program would return the first, but it could be modified to return an arbirary number of ties.
Upvotes: 5