Nip
Nip

Reputation: 369

SAS Generate a new column based on the existing groups

I'm currently working on a report using SAS, which has such a table as below:

Name Country  Pct  Flag
A      USA     40   Y
A      CAN     30   N
A      CHN     30   N
B      BRA     70   N
B      JAP     30   Y

I would like to generate a new column Name_Flag, which is equal to the flag for the record with the highest pct for that name. For instance, name_flag for A should be Y, and for B should be N.

Could anyone give me a little hit how to achieve this in SAS? Really appreciate that:)

Upvotes: 0

Views: 809

Answers (3)

mvherweg
mvherweg

Reputation: 1283

EDIT: Keith's answer is simpler and cleaner. Only if your data is already sorted by name AND the dataset is large, i would advice to take my approach, since it does not require another sort. Otherwise, stick to Keith's approach.

Assuming data is already sorted by name:

*First we find the correct flag per group;
data BEST_PER_GROUP (keep=Name Name_Flag);
    set DATASET;
    by Name;

    *Need to retain this until we looked at all candidates;
    retain highest_pct 0;
    retain Name_Flag '';

    *Find the flag of the highest Pct;
    if Pct > highest_pct then do;
        highest_pct = Pct;
        Name_Flag = Flag;
    end;

    *When having looked at all records for a given Name, output the result;
    if last.Name then do;
        output;
        *Reset for next value group of Name;
        highest_pct = 0;
        Name_Flag = '';
    end;
run;

*Merge it back with your data;
data DATASET;
    merge DATASET BEST_PER_GROUP;
    by Name;
run;

Upvotes: 0

Longfish
Longfish

Reputation: 7602

A slightly simpler solution here.

data have;
input Name $ Country $ Pct  Flag $;
datalines;
A      USA     40   Y
A      CAN     30   N
A      CHN     30   N
B      BRA     70   N
B      JAP     30   Y
;
run;

proc sort data=have;
by name descending pct;
run;

data want;
set have;
by name descending pct;
retain name_flag;
if first.name then name_flag=flag;
run;

Upvotes: 2

Jeremy Bassett
Jeremy Bassett

Reputation: 171

You may need to tweak this as I have no SAS Session running to test with.

proc sort data = flagData;
  by pct descending;
run;

data flagDataDone;
  retain nameWithHighestPct;
  set flagData;
  if _n_ = 1 then do;
    nameWithHighestPct = name;
  end;
  name_flag = 'N';
  if name = nameWithHighestPct then do;
    name_flag = 'Y';
  end;
  drop nameWithHighestPct;
run;

Upvotes: 0

Related Questions