Reputation: 19
I would like to count parts of the dataset.
This is my dataset:
YEAR ㅣ FIRMCode ㅣ FIRMName
2000 ㅣ 10 ㅣ 1
2001 ㅣ 11 ㅣ 1
.
.
2020 ㅣ 17 ㅣ 1
2000 ㅣ 11 ㅣ 2
.
.
2020 ㅣ 16 ㅣ 2
I want to count the number of types of firm codes each year, regardless of the firm name. The firm codes are from 10 to 20. So my output would look like:
YEAR ㅣ FIRMCode(10) ㅣ FIRMCode(11) ... ㅣ FIRMCode(20)
2000 ㅣ #firms with code10 ㅣ #firms with code11
.
.
2020 ㅣ #firms with code10 ㅣ #firms(11)
Thank you so much in advance.
Upvotes: 0
Views: 287
Reputation: 27498
One way is to use SQL to compute count (distinct FIRMName)
within a group and then Proc REPORT
or Proc TABULATE
to present the distinct counts as columns.
Your data does not appear to model a scenario in which a firm can have multiple codes within a year; if so, the need for distinct
is not specifically needed.
Example:
Compute the number of firms with a code within a year, maintaining a categorical data structure. Present the counts in a wide layout presentation. Also, show a data transformation from two level category into a wide data set (generally not recommended)
data have;
call streaminit (20210425);
do firmname = 'a', 'b', 'c', 'd';
do year = 2000 to 2010;
code = 10 + rand('integer', 0, 5);
output;
end;
end;
run;
proc sql;
create table counts as
select year, code, count(distinct firmname) as firm_ucount
from have
group by year, code
order by year, code;
proc tabulate data=counts;
class year code;
var firm_ucount;
table year, code * firm_ucount='' * mean='' * f=4.;
run;
proc report data=counts;
columns year firm_ucount,code;
define year / group;
define code / '' across;
define firm_ucount / '#firms with code';
run;
proc transpose data=counts out=want prefix=n_code_;
by year;
id code;
var firm_ucount;
run;
TABULATE
, count class in column-dimension
REPORT
, count variable ACROSS
TRANSPOSE
, code as part of column name
Upvotes: 0