Count the number of unique ids for every subset of variables

Question

I want to find the number of unique ids for every subset combination of the variables. For example

data have;
    input id var1 var2 var3;
    datalines;
    5 1 0 0
    5 1 1 1
    5 1 0 1
    5 0 0 0
    6 1 0 0
    7 1 1 1
    8 1 0 1
    9 0 0 0
   10 1 0 0
   11 1 0 0
   12 1 . 1
   13 0 0 1
;
run;

I want the result to be

var1    var2    var3 count
.       .       0       5
.       .       1       5
.       0       .       7
.       0       0       5
.       0       1       3
.       1       .       2
.       1       1       2
0       .       .       3
0       .       0       2
0       .       1       1
0       0       .       3
0       0       0       2
0       0       1       1
1       .       .       7
1       .       0       4
1       .       1       4
1       0       .       5
1       0       0       4
1       0       1       2
1       1       .       2
1       1       1       2

which is the result of appending all the possible proc sql; group bys (var1 is shown below)

proc sql;
create table sub1 as
     select var1, count(distinct id) as count
     from have
     where not missing(var1)
     group by var1
;
quit;

I don't care about the case where all variables are missing or when any of the variables in the group by are missing. Is there a more efficient way of doing this?

Richard · Accepted Answer

You can use Proc SUMMARY to compute the combinations of var1-var3 values for each id by group. From the SUMMARY output a SQL query can count the distinct ids per combination.

Example:

data have;
input id var1 var2 var3;
datalines;
5 1 0 0
5 1 1 1
5 1 0 1
5 0 0 0
6 1 0 0
7 1 1 1
8 1 0 1
9 0 0 0
10 1 0 0
11 1 0 0
12 1 . 1
13 0 0 1
;

proc summary noprint missing data=have;
  by id;
  class var1-var3;
  output out=combos;
run;

proc sql;
  create table want as
  select   var1, var2, var3, count(distinct id) as count
  from combos
  group by var1, var2, var3
  ;

Count the number of unique ids for every subset of variables

Answers (1)

Related Questions