Vlada Pleshcheva
Vlada Pleshcheva

Reputation: 81

Indicator variable for maximum value by groups

Is there any more elegant way than that presented below for the following task:

to create Indicator Variables (below "MAX_X1" and "MAX_X2") whithin each group (below "key1") of multiple observation (below "key2") with value 1 if this observation corresponds to the maximum value of the variable in eache group and 0 otherwise

data have;
call streaminit(4321);
do key1=1 to 10;
 do key2=1 to 5;
  do x1=rand("uniform");
     x2=rand("Normal");
     output;
  end;
 end;
end;
run;

proc means data=have noprint;
by key1;
var x1 x2;
output out=max
max= / autoname;
run;

data want;
merge have max;
by key1;
drop _:;
run;

proc sql;
    title "MAX";
    select name into :MAXvars separated by ' '
        from dictionary.columns
            WHERE LIBNAME="WORK" AND MEMNAME="WANT" AND NAME like "%_Max"
                        order by name;
quit;
title;

data want; set want;
    array MAX  (*) &MAXvars;
    array XVars (*) x1 x2;
    array Indicators (*) MAX_X1 MAX_X2;
    do i=1 to dim(MAX);
     if XVars[i]=MAX[i] then Indicators[i]=1; else Indicators[i]=0;
    end;
drop i;
run;

Thanks for any suggestion of optimization

Upvotes: 0

Views: 502

Answers (2)

user667489
user667489

Reputation: 9569

It is also possible to do this in a single data step, provided that you read the input dataset twice - this is an example of a double DOW-loop.

data have;
call streaminit(4321);
do key1=1 to 10;
 do key2=1 to 5;
  do x1=rand("uniform");
     x2=rand("Normal");
     output;
  end;
 end;
end;
run;

/*Sort by key1 (or generate index) if not already sorted*/
proc sort data = have;
    by key1;
run;

data want;
    if 0 then set have;
    array xvars[3,2] x1 x2 x1_max_flag x2_max_flag t_x1_max t_x2_max;
    /*1st DOW-loop*/
    do _n_ = 1 by 1 until(last.key1);
        set have;
        by  key1;
        do i = 1 to 2;
            xvars[3,i] = max(xvars[1,i],xvars[3,i]);
        end;
    end;
    /*2nd DOW-loop*/
    do _n_ = 1 to _n_;
        set have;
        do i = 1 to 2;
            xvars[2,i] = (xvars[1,i] = xvars[3,i]);
        end;
        output;
    end;
    drop i t_:;
run;

This may be a bit complicated to understand, so here's a rough explanation of how it flows:

  • Read one by group with the first DOW-loop, updating rolling max variables as each row is read in. Don't output anything yet.
  • Now read the same by-group again using the second DOW-loop, checking to see whether each row is equal to the rolling max and outputting each row.
  • Go back to first DOW-loop, read the next by-group and repeat.

Upvotes: 1

KnowYourOnion
KnowYourOnion

Reputation: 201

Proc sql can be used with a group by statement to allow summary functions across values of a variable.

    data have;
    call streaminit(4321);
    do key1=1 to 10;
     do key2=1 to 5;
      do x1=rand("uniform");
         x2=rand("Normal");
         output;
      end;
     end;
    end;
    run;

    proc sql;
        create table want
        as select
        key1,
        key2,
        x1,
        x2,
        case 
            when x1 = max(x1) then 1
            else 0 end as max_x1,
        case
            when x2 = max(x2) then 1
            else 0 end as max_x2
        from have
        group by key1
        order by key1, key2;
    quit;

Upvotes: 2

Related Questions