Reputation: 3621
I would like to create an id variable to identify the unique groups according to several variables.
For example, I have the data cars from sashelp.cars
, and I would like to identify unique groups of Make
- DriveTrain
and Cylinder
with the id variable grp_id
. So the same Make
and same DriveTrain
with a different number of Cylinder
would be considered as a new group (and hence, a new value in the id variable grp_id
).
I tried the following, but this would reset the id variable to 1 when there is a new case, nor really considering every unique combination of Make + DriveTrain + Cylinder
as a different group id.
data cars; set sashelp.cars; run;
proc sort data=cars; by Make DriveTrain Cylinders; run;
data cars; set cars;
grp_id + 1;
by Make DriveTrain Cylinders
if first.Make or first.DriverTrain or first.Cylinders then grp_id = 1;
run;
Any idea on how to create this grp_id
variable using several variables as the criteria?
Upvotes: 1
Views: 1730
Reputation: 27498
You want each combination to have a unique group id. Don't reset the group id. If you were also assigning a seq number within the group, you would reset the sequence variable.
When to reset. For combinations, increment the group id when the last listed by variable is tagged as having state first.
Example:
proc sort data=sashelp.cars out=cars;
by Make DriveTrain Cylinders;
run;
data cars;
set cars;
by Make DriveTrain Cylinders;
if first.Cylinders then grp_id + 1;
* this answer gives you bonus information ! ;
if first.Cylinders
then seq_in_group = 1;
else seq_in_group + 1;
run;
Note: Conceptually BY
defines a hierarchy of n variables. When a variable in the hierarchy changes value, during a serial pass through the data, its state is set first.varm=1. Additionally all the subordinate levels first. automatic variables will have the same state. In other words, this assertion is true: first.varm+1=1 ... first.varn=1.
Upvotes: 4
Reputation: 4937
As an alternative, here is a hashing approach that does not require sorting.
data cars;
if _N_ = 1 then do;
declare hash h ();
h.definekey ('Make', 'DriveTrain', 'Cylinders');
h.definedata ('grp_id');
h.definedone();
end;
set sashelp.cars;
if h.find () ne 0 then grp_id + 1;
h.replace();
run;
Upvotes: 2
Reputation: 4937
Using your own code, you only have to make a small change
data cars;
set sashelp.cars;
run;
proc sort data=cars;
by Make DriveTrain Cylinders;
run;
data cars;
set cars;
by Make DriveTrain Cylinders;
if first.Make or first.DriveTrain or first.Cylinders then grp_id + 1;
run;
Upvotes: 1