Reputation: 5870
Assume I have two clusters of clients and for each client I have a value, how much worth the client is for me. I want to now create a summary table with rows sub-summing the all clients' values for that cluster.
Example input table clients
:
client cluster value
1 1 3
2 2 1
3 1 4
4 1 1
5 2 5
Expected output wanted
:
cluster client value comment
1 1 3 kEuro
1 3 4 kEuro
1 4 1 kEuro
1 . 8 sum for cluster 1
2 2 1 kEuro
2 5 5 kEuro
2 . 6 sum for cluster 2
I am now looking for the most efficient way of achieving this. What I came up with so far is the code below, which does not fully solve it: Instead of sum for cluster x
I only see sum f
.
proc sql;
create table part_1 as
select cluster, client, value
, "kEuro" as comment
from clients
where cluster = 1;
quit;
proc sql;
create table part_2 as
select cluster, client, value
, "kEuro" as comment
from clients
where cluster = 2;
quit;
/* The summation over the clusters */
proc means noprint data=part1;
output out=psum_1 (drop=_type_ _freq_) sum=;
run;
proc means noprint data=part2;
output out=psum_2 (drop=_type_ _freq_) sum=;
run;
/* The concatenation stage 1 */
data puzzle_1;
set part_1 psum_1 (in=in2);
if in2 then do;
client = .;
comment = 'sum for cluster 1';
end;
run;
data puzzle_2;
set part_2 psum_2 (in=in2);
if in2 then do;
client = .;
comment = 'sum for cluster 2';
end;
run;
/* The concatenation stage 2 */
data wanted;
set puzzle_1 puzzle_2;
run;
Is there a better way to approach this problem, ideally something which I could loop over, if I have more than two clusters?
Upvotes: 2
Views: 168
Reputation: 51566
You can use PROC MEANS/SUMMARY to generate the summary rows. Then use a SET with a BY to add the summary rows to the detail rows.
data have;
input client cluster value ;
cards;
1 1 3
2 2 1
3 1 4
4 1 1
5 2 5
;
proc sort data=have; by cluster client; run;
proc means data=have noprint nway ;
by cluster;
var value;
output out=sum sum= ;
run;
data want;
set have sum ;
by cluster;
length comment $20 ;
if last.cluster then comment=catx(' ','sum for cluster',cluster);
else comment='kEuro';
run;
proc print ;
var cluster client value comment;
run;
Upvotes: 2
Reputation: 27508
Typically you would not add summary rows to the data set containing the details being aggregated. Instead, consider using a reporting tool such as Proc REPORT
, Proc TABULATE
or Proc SUMMARY
Example:
Show details and summary row using Proc REPORT
data have; input
client cluster value; datalines;
1 1 3
2 2 1
3 1 4
4 1 1
5 2 5
;
ods html file='report.html';
proc report data=have;
columns cluster client value;
define cluster / order;
define client / display;
define value / sum;
break after cluster / summarize style=[fontstyle=italic background=lightgray];
run;
ods html close;
If you absolutely feel the need to make your original data more difficult to use further downstream:
CLASS cluster
statement to your Proc MEANS
SORT
original data by cluster
SET original means_output; by cluster;
Upvotes: 1