Reputation: 290
I want to compute multiple sums on the same column based on some criteria. Here is a small example using the sashelp.cars
dataset.
The code below somewhat achieves what I want to do in three (3) different ways, but there is always a small problem.
proc report data=sashelp.cars out=test2;
column make type,invoice type,msrp;
define make / group;
define type / across;
define invoice / analysis sum;
define msrp / analysis sum;
title "Report";
run;
proc print data=test2;
title "Out table for the report";
run;
proc summary data=test nway missing;
class make type;
var invoice msrp;
output out=sumTest(drop= _Freq_ _TYPE_) sum=;
run;
proc transpose data=sumTest out=test3;
by make;
var invoice msrp;
id type;
run;
proc print data=test3;
title "Table using proc summary followed by proc transpose";
run;
proc sql undo_policy=none;
create table test4 as select
make,
sum(case when type='Sedan' then invoice else 0 end) as SedanInvoice,
sum(case when type='Wagon' then invoice else 0 end) as WagonInvoice,
sum(case when type='SUV' then invoice else 0 end) as SUVInvoice,
sum(case when type='Sedan' then msrp else 0 end) as Sedanmsrp,
sum(case when type='Wagon' then msrp else 0 end) as Wagonmsrp,
sum(case when type='SUV' then msrp else 0 end) as SUVmsrp
from sashelp.cars
group by make;
quit;
run;
proc print data=test4;
title "Table using SQL queries and CASE/WHEN to compute new columns";
run;
Here is the result I get when I run the presented code.
The first two tables represent the result and the out table of the report
procedure. The problem I have with this approach is the column names produced by proc report
. I would love to be able to define them myself, but I don't see how I can do this. It is important for further referencing.
The third table represent the result of the proc summary
/proc transpose
portion of the code. The problem I have with this approach is that Invoice
and MSRP
appears as rows in the table, instead of columns. For that reason, I think the proc report
is better.
The last table represents the use of an SQL
query. The result is exactly what I want, but the code is heavy. I have to do a lot of similar computation on my dataset and I believe this approach is cumbersome.
Could you help improve one of these methods ?
Upvotes: 0
Views: 411
Reputation: 27498
Use Proc TABULATE
. Very succinct expressions for specifying row and column dimensions defined by desired hierarchy of class variables.
The intersection of these dimensions is a cell and represents a combination of values that select the values for which a statistical measure is displayed in the cell.
In your case the SUM
is sum of dollars, which might not make sense when the cell has more then one contributing value.
For example: Does it make sense to show the invoice
sum
for 11
Volkswagen
Sedan
's is $335,813
?
Also note the 'inverted' hierarchy used to show the number of contributing values.
Example:
proc tabulate data=sashelp.cars;
class make type;
var invoice msrp;
table
make=''
,
type * invoice * sum=''*f=dollar9.
type * msrp * sum=''*f=dollar9. /* this is an adjacent dimension */
(invoice msrp) * type * n='' /* specify another adjacent dimension, with inverted hierarchy */
/
box = 'Make'
;
where make =: 'V';
run;
Upvotes: 0
Reputation: 51566
You can just use two PROC TRANSPOSE steps;
proc summary data=sashelp.cars nway missing;
where make=:'V';
class make type;
var invoice msrp;
output out=step1(drop= _Freq_ _TYPE_) sum=;
run;
proc transpose data=step1 out=step2;
by make type ;
var invoice msrp;
run;
proc transpose data=step2 out=step3(drop=_name_);
by make;
id type _name_ ;
var col1 ;
run;
proc print data=step3;
title "Table using proc summary followed by 2 proc transpose steps";
run;
Results:
Sedan Sedan Wagon Wagon
Obs Make SUVInvoice SUVMSRP Invoice MSRP Invoice MSRP
1 Volkswagen $32,243 $35,515 $335,813 $364,020 $77,184 $84,195
2 Volvo $38,851 $41,250 $313,990 $333,240 $57,753 $61,280
Upvotes: 1