missy
missy

Reputation: 1

How to do a group by count on multiple columns in SAS (data step)?

I already have the answer in proc sql but I need the data step version of my code. If someone could please help me convert it, I would be grateful.

PROC SQL;
CREATE TABLE CARS AS
SELECT Origin, Type, Cylinders, DriveTrain, COUNT(*) AS COUNT
FROM SASHELP.CARS 
group by Origin, Type, Cylinders, DriveTrain;
QUIT;

Upvotes: 0

Views: 4907

Answers (2)

Tom
Tom

Reputation: 51621

As long you none of your key variables have missing values and the full summary table will fit into your available memory you could use data step HASH.

That will eliminate the need to pre-sort the data.

data _null_;
  set sashelp.cars end=eof;
  if _n_=1 then do;
     declare hash h(ordered:'yes');
     rc=h.definekey('Origin','Type','Cylinders','DriveTrain');
     rc=h.definedata('Origin','Type','Cylinders','DriveTrain','count');
     rc=h.definedone();
  end;
  if h.find() then count=0;
  count+1;
  rc=h.replace();
  if eof then rc=h.output(dataset:'cars2');
run;

Upvotes: 1

Reeza
Reeza

Reputation: 21294

Data step would not be the appropriate solution here, PROC FREQ would be the SAS solution.

proc freq data=sashelp.cars;
table origin*type*cylinders*drivetrain / out=cars list;
run;

For completeness, here's a data step approach. Very much not recommended:

  • Sort data set first by grouping variables
  • Use BY Group in data step to identify groups of interest
  • Use RETAIN to hold value across rows
  • Use FIRST./LAST. to accumulate counter and output
*sort for BY statement is required;
proc sort data=sashelp.cars out=cars_sorted;
by origin type cylinders drivetrain;
run;

data cars_count;
set cars_sorted;
by origin type cylinders drivetrain;

*RETAIN tells SAS to keep this variable across rows, otherwise it resets for each observation;
retain count;

*if first in category set count to 0;
if first.drivetrain then count=0;

*increment count for each record (implicit retain so RETAIN is not actually required here);
count+1;

*if last of the group then output the total count for that group;
if last.drivetrain then output;

*keep only variables of interest;
keep origin type cylinders drivetrain count;
run;

*display results;
proc print data=cars_count;
run;

Upvotes: 1

Related Questions