Reputation: 61
I am looking for a way to produce a frequency table displaying number of unique values for variable ID
per uniqe value of variable Subclass
.
I would like to order the results by variable Class
.
Preferably I would like to display the number of unique values for ID
per Subclass
as a fraction of n for ID
. In the want-example below this values is displayed under %totalID.
In addition I would like to display the number of unique values for ID
per Subclass
as a fraction of the sum of unique ID
values found within each Class
. In the want-example below this values is displayed under %withinclassID.
Have:
ID Class Subclass
-------------------------------
ID1 1 1a
ID1 1 1b
ID1 1 1c
ID1 2 2a
ID2 1 1a
ID2 1 1b
ID2 2 2a
ID2 2 2b
ID2 3 3a
ID3 1 1a
ID3 1 1d
ID3 2 2a
ID3 3 3a
ID3 3 3b
Want:
Unique number
Class Subclass of IDs %totalID %withinclassID
--------------------------------------------------------------------
1
1a 3 100.0 50.00
1b 2 66.67 33.33
1c 1 33.33 16.67
SUM 6
2
2a 3 100.0 75.00
2b 1 33.33 25.00
SUM 4
3
3a 2 66.67 66.67
3b 1 33.33 33.33
SUM 3
My initial approach was to perform a PROC FREQ on NLEVELS producing a frequency table for number of unique IDs per subclass. Here however I lose information on class. I therefore cannot order the results by class.
My second approach involved using PROC TABULATE. I however cannot produce any percentage calculations based on unique counts in such a table.
Is there a direct way to tabulate the frequencies of one variable according to a second variable, grouped by a third variable--displaying overall and within group percentages?
Upvotes: 0
Views: 4294
Reputation: 7602
The nlevels
option in proc freq
can produce the unique count you're after without losing data, providing you include Class and Subclass variables in the by
statement. That also means you'll have to presort the data by the same variables.
Then you could try proc tabulate
to get the rest of your requirement.
data have;
input ID $ Class Subclass $;
datalines;
ID1 1 1a
ID1 1 1b
ID1 1 1c
ID1 2 2a
ID2 1 1a
ID2 1 1b
ID2 2 2a
ID2 2 2b
ID2 3 3a
ID3 1 1a
ID3 1 1d
ID3 2 2a
ID3 3 3a
ID3 3 3b
;
run;
proc sort data=have;
by class subclass;
run;
ods output nlevels = unique_id_count;
proc freq data=have nlevels;
by class subclass;
run;
Upvotes: 1
Reputation: 21264
You can do a double proc freq or SQL.
/*This demonstrates how to count the number of unique occurences of a variable across groups. It uses the SASHELP.CARS dataset which is available with any SAS installation. The objective is to determine the number of unique car makers by origin/
Note: The SQL solution can be off if you have a large data set and these are not the only two ways to calculate distinct counts. If you're dealing with a large data set other methods may be appropriate.*/
*Count distinct IDs;
proc sql;
create table distinct_sql as
select origin, count(distinct make) as n_make
from sashelp.cars
group by origin;
quit;
*Double PROC FREQ;
proc freq data=sashelp.cars noprint;
table origin * make / out=origin_make;
run;
proc freq data=origin_make noprint;
table origin / out= distinct_freq outpct;
run;
title 'PROC FREQ';
proc print data=distinct_freq;
run;
title 'PROC SQL';
proc print data=distinct_sql;
run;
Upvotes: 2