B--rian
B--rian

Reputation: 5870

Building a table with sub-summary-rows in SAS

Assume I have two clusters of clients and for each client I have a value, how much worth the client is for me. I want to now create a summary table with rows sub-summing the all clients' values for that cluster.

Example input table clients:

client  cluster  value
1       1        3
2       2        1
3       1        4
4       1        1
5       2        5 

Expected output wanted:

cluster  client   value    comment
1        1        3        kEuro
1        3        4        kEuro
1        4        1        kEuro
1        .        8        sum for cluster 1
2        2        1        kEuro
2        5        5        kEuro
2        .        6        sum for cluster 2

I am now looking for the most efficient way of achieving this. What I came up with so far is the code below, which does not fully solve it: Instead of sum for cluster x I only see sum f.

proc sql;
  create table part_1 as
  select cluster, client, value
       , "kEuro" as comment
  from clients
  where cluster = 1;
quit;

proc sql;
  create table part_2 as
  select cluster, client, value
       , "kEuro" as comment
  from clients
  where cluster = 2;
quit;

/* The summation over the clusters */
proc means noprint data=part1;
  output out=psum_1 (drop=_type_ _freq_) sum=;
run;

proc means noprint data=part2;
  output out=psum_2 (drop=_type_ _freq_) sum=;
run;

/* The concatenation stage 1 */
data puzzle_1;
  set part_1 psum_1 (in=in2);
  if in2 then do;
    client = .;
    comment = 'sum for cluster 1';
  end;
run;

data puzzle_2;
  set part_2 psum_2 (in=in2);
  if in2 then do;
    client = .;
    comment = 'sum for cluster 2';
  end;
run;

/* The concatenation stage 2 */
data wanted;
  set puzzle_1 puzzle_2;
run;

Is there a better way to approach this problem, ideally something which I could loop over, if I have more than two clusters?

Upvotes: 2

Views: 168

Answers (2)

Tom
Tom

Reputation: 51566

You can use PROC MEANS/SUMMARY to generate the summary rows. Then use a SET with a BY to add the summary rows to the detail rows.

data have;
 input client cluster value ;
cards;
1 1 3
2 2 1
3 1 4
4 1 1
5 2 5
;

proc sort data=have; by cluster client; run;
proc means data=have noprint nway ;
  by cluster;
  var value;
  output out=sum sum= ;
run;

data want;
  set have sum ;
  by cluster;
  length comment $20 ;
  if last.cluster then comment=catx(' ','sum for cluster',cluster);
  else comment='kEuro';
run;

proc print ;
  var cluster client value comment;
run;

Upvotes: 2

Richard
Richard

Reputation: 27508

Typically you would not add summary rows to the data set containing the details being aggregated. Instead, consider using a reporting tool such as Proc REPORT, Proc TABULATE or Proc SUMMARY

Example:

Show details and summary row using Proc REPORT

data have; input 
client  cluster  value; datalines;
1       1        3
2       2        1
3       1        4
4       1        1
5       2        5 
;

ods html file='report.html';

proc report data=have;
  columns cluster client value;
  define cluster / order;
  define client / display;
  define value / sum;
  break after cluster / summarize style=[fontstyle=italic background=lightgray]; 

run;

ods html close;

enter image description here

If you absolutely feel the need to make your original data more difficult to use further downstream:

  • add a CLASS cluster statement to your Proc MEANS
  • SORT original data by cluster
  • stack original data with MEANS output using SET original means_output; by cluster;

Upvotes: 1

Related Questions