User981636
User981636

Reputation: 3621

Create a group id when grouping by several variables

I would like to create an id variable to identify the unique groups according to several variables.

For example, I have the data cars from sashelp.cars, and I would like to identify unique groups of Make - DriveTrain and Cylinder with the id variable grp_id. So the same Make and same DriveTrain with a different number of Cylinder would be considered as a new group (and hence, a new value in the id variable grp_id).

I tried the following, but this would reset the id variable to 1 when there is a new case, nor really considering every unique combination of Make + DriveTrain + Cylinder as a different group id.

data cars; set sashelp.cars; run;

proc sort data=cars; by Make DriveTrain Cylinders; run;
data cars; set cars; 
grp_id + 1;
by Make DriveTrain Cylinders
if first.Make or first.DriverTrain or first.Cylinders then grp_id = 1; 
run;

Any idea on how to create this grp_id variable using several variables as the criteria?

Upvotes: 1

Views: 1730

Answers (3)

Richard
Richard

Reputation: 27498

You want each combination to have a unique group id. Don't reset the group id. If you were also assigning a seq number within the group, you would reset the sequence variable.

When to reset. For combinations, increment the group id when the last listed by variable is tagged as having state first.

Example:

proc sort data=sashelp.cars out=cars; 
  by Make DriveTrain Cylinders;
run;

data cars;
  set cars;
  by Make DriveTrain Cylinders;

  if first.Cylinders then grp_id + 1;

  * this answer gives you bonus information ! ;
  if first.Cylinders 
    then seq_in_group = 1; 
    else seq_in_group + 1;
run;

Note: Conceptually BY defines a hierarchy of n variables. When a variable in the hierarchy changes value, during a serial pass through the data, its state is set first.varm=1. Additionally all the subordinate levels first. automatic variables will have the same state. In other words, this assertion is true: first.varm+1=1 ... first.varn=1.

Upvotes: 4

PeterClemmensen
PeterClemmensen

Reputation: 4937

As an alternative, here is a hashing approach that does not require sorting.

data cars;
   if _N_ = 1 then do;
      declare hash h ();
      h.definekey ('Make', 'DriveTrain', 'Cylinders');
      h.definedata ('grp_id');
      h.definedone();
   end;

   set sashelp.cars;

   if h.find () ne 0 then grp_id + 1;
   h.replace();
run;

Upvotes: 2

PeterClemmensen
PeterClemmensen

Reputation: 4937

Using your own code, you only have to make a small change

data cars; 
   set sashelp.cars; 
run;

proc sort data=cars; 
   by Make DriveTrain Cylinders; 
run;

data cars;
   set cars;
   by Make DriveTrain Cylinders;
   if first.Make or first.DriveTrain or first.Cylinders then grp_id + 1; 
run;

Upvotes: 1

Related Questions