Reputation: 1
I already have the answer in proc sql but I need the data step version of my code. If someone could please help me convert it, I would be grateful.
PROC SQL;
CREATE TABLE CARS AS
SELECT Origin, Type, Cylinders, DriveTrain, COUNT(*) AS COUNT
FROM SASHELP.CARS
group by Origin, Type, Cylinders, DriveTrain;
QUIT;
Upvotes: 0
Views: 4907
Reputation: 51621
As long you none of your key variables have missing values and the full summary table will fit into your available memory you could use data step HASH.
That will eliminate the need to pre-sort the data.
data _null_;
set sashelp.cars end=eof;
if _n_=1 then do;
declare hash h(ordered:'yes');
rc=h.definekey('Origin','Type','Cylinders','DriveTrain');
rc=h.definedata('Origin','Type','Cylinders','DriveTrain','count');
rc=h.definedone();
end;
if h.find() then count=0;
count+1;
rc=h.replace();
if eof then rc=h.output(dataset:'cars2');
run;
Upvotes: 1
Reputation: 21294
Data step would not be the appropriate solution here, PROC FREQ would be the SAS solution.
proc freq data=sashelp.cars;
table origin*type*cylinders*drivetrain / out=cars list;
run;
For completeness, here's a data step approach. Very much not recommended:
*sort for BY statement is required;
proc sort data=sashelp.cars out=cars_sorted;
by origin type cylinders drivetrain;
run;
data cars_count;
set cars_sorted;
by origin type cylinders drivetrain;
*RETAIN tells SAS to keep this variable across rows, otherwise it resets for each observation;
retain count;
*if first in category set count to 0;
if first.drivetrain then count=0;
*increment count for each record (implicit retain so RETAIN is not actually required here);
count+1;
*if last of the group then output the total count for that group;
if last.drivetrain then output;
*keep only variables of interest;
keep origin type cylinders drivetrain count;
run;
*display results;
proc print data=cars_count;
run;
Upvotes: 1