Reputation: 572
Consider the following test data set:
data test;
input Drug $ Quantity State $ Year;
datalines;
A 10 NY 2013
A 20 NY 2014
B 110 NY 2013
B 210 NY 2014
A 50 OH 2013
A 60 OH 2014
B 150 OH 2013
B 260 OH 2014
A 22 NY 2014
B 100 OH 2013
;
RUN;
The following code below sums the quantities of drugs A and B by Drug and State for the year 2013:
proc sql;
create table testnew as
select *, sum(Quantity) as total from test
where Year=2013
group by Drug,State;
quit;
I am interested in getting the proportion of each drug to the total quantity for each state. So for example in Ohio there are a total of 300 units of drug A and B in 2013. The proportion of A would be 50/300 and the proportion of B would be 250/300.
The code below gets the total drugs by state:
proc sql;
create table testnew1 as
select *, sum(Quantity) as total1 from test
where Year=2013
group by State;
quit;
I was thinking that I could then merge test
and test1
and divide total
by total1
to get the proportions. But is there an easier way of doing this?
Upvotes: 4
Views: 3641
Reputation: 7119
If you need to calculate sub totals without Window you will spend quite a lot of time writing SQL. Anyway this will return the right results:
In the first sub-query I calculate the totals for each Year/State/Drug. I'm doing this because you could have the same drug on multiple rows.
In the second sub-query the total for each Year/State
Then I joined them to calculate the percentage
.
proc sql;
create table testnew1 as
select s1.Year,
s1.State,
s1.Drug,
sum(s1.total_by_drug) as Quantity,
sum(s2.total_by_state) ,
sum(s1.total_by_drug) / sum(s2.total_by_state) as PCT_By_drug format=percent9.2
from ( /* Total by drug */
select Year,
State,
Drug,
sum(Quantity) as total_by_drug
from test
group by Year,
State,
Drug
) s1
inner join ( /* Total by state */
select Year,
State,
sum(Quantity) as total_by_state
from test
group by Year,
State
) s2
on s1.State = s2.State
and s1.Year = s2.Year
where s1.Year = 2013 /* if you need to filter the year */
group by s1. Year,
s1.State,
s1.Drug
;
run;
But if you want to give a shot to SAS PROC REPORT:
PROC REPORT DATA= test;
COLUMN State Drug Quantity,SUM pctDrugByState ;
DEFINE State / GROUP;
DEFINE Drug / GROUP;
DEFINE pctDrugByState / COMPUTED FORMAT=percent8.1 'Percent of State Total';
* BREAK AFTER State / SUMMARIZE; * This shows total by State;
WHERE Year =2013;
compute before State;
totState = Quantity.sum;
endcomp;
compute pctDrugByState;
pctDrugByState = Quantity.sum / totState;
endcomp;
RUN;
Upvotes: 0
Reputation: 671
First of all when summarizing variables in SQL you should avoid including input variables other than the "group by" vars and the summarized ones in the final table. This to prevent a duplication of the rows.
The first SQL you wrote outputs 5 rows even if the drug/state combinations are only 4. So instead of selecting * it's better to specify the grouping variables and to use the numeric notation in the "group by" clause:
proc sql;
create table testnew as
select State,
Drug,
sum(Quantity) as total
from test
where Year=2013
group by 1, 2;
quit;
To have the proportions of each drug relative to the State total you can use a subquery where you compute the total by State and than directly use it in the outer query:
proc sql;
create table testnew1 as
select State,
Drug,
sum(Quantity) as total,
total_by_state,
(calculated total) / total_by_state as proportion format=percent9.2
from (select *,
sum(Quantity) as total_by_state
from test
where Year=2013
group by State)
where Year=2013
group by 1, 2;
quit;
If you want you can then remove the where clause and include the Year variable in the group by, both in the outer and inner query.
Upvotes: 1