NebulousReveal
NebulousReveal

Reputation: 572

Group By Statement proc sql

Consider the following test data set:

data test;
input Drug $ Quantity State $ Year;
datalines;
A 10 NY 2013
A 20 NY 2014
B 110 NY 2013
B 210 NY 2014   
A 50 OH 2013
A 60 OH 2014
B 150 OH 2013
B 260 OH 2014       
A 22 NY 2014
B 100 OH 2013
;
RUN;

The following code below sums the quantities of drugs A and B by Drug and State for the year 2013:

    proc sql;
    create table testnew as
    select *, sum(Quantity) as total from test
    where Year=2013
    group by Drug,State;
    quit;

I am interested in getting the proportion of each drug to the total quantity for each state. So for example in Ohio there are a total of 300 units of drug A and B in 2013. The proportion of A would be 50/300 and the proportion of B would be 250/300.

The code below gets the total drugs by state:

  proc sql;
  create table testnew1 as
  select *, sum(Quantity) as total1 from test
  where Year=2013
  group by State;
  quit;

I was thinking that I could then merge test and test1 and divide total by total1 to get the proportions. But is there an easier way of doing this?

Upvotes: 4

Views: 3641

Answers (2)

mucio
mucio

Reputation: 7119

If you need to calculate sub totals without Window you will spend quite a lot of time writing SQL. Anyway this will return the right results:

  1. In the first sub-query I calculate the totals for each Year/State/Drug. I'm doing this because you could have the same drug on multiple rows.

  2. In the second sub-query the total for each Year/State

  3. Then I joined them to calculate the percentage

.

proc sql;
    create table testnew1 as
        select s1.Year,
               s1.State,
               s1.Drug,
               sum(s1.total_by_drug) as Quantity,
               sum(s2.total_by_state) ,
               sum(s1.total_by_drug) / sum(s2.total_by_state) as PCT_By_drug format=percent9.2
          from (  /* Total by drug */
                  select Year,
                         State, 
                         Drug,
                         sum(Quantity) as total_by_drug
                    from test
                group by Year,
                         State,
                         Drug
               ) s1
    inner join ( /* Total by state */
                  select Year,
                         State, 
                         sum(Quantity) as total_by_state
                    from test
                group by Year,
                         State
               ) s2
            on s1.State = s2.State
           and s1.Year = s2.Year
         where s1.Year = 2013 /* if you need to filter the year */
      group by s1. Year,
               s1.State,
               s1.Drug
;
run;

But if you want to give a shot to SAS PROC REPORT:

PROC REPORT DATA= test;
    COLUMN State Drug Quantity,SUM pctDrugByState ;
    DEFINE State / GROUP;
    DEFINE Drug / GROUP;
    DEFINE pctDrugByState / COMPUTED FORMAT=percent8.1 'Percent of State Total';
    * BREAK AFTER State / SUMMARIZE; * This shows total by State;
    WHERE Year =2013;

    compute before State; 
        totState = Quantity.sum; 
    endcomp;

    compute pctDrugByState; 
        pctDrugByState = Quantity.sum / totState; 
    endcomp;

RUN;

Upvotes: 0

DaBigNikoladze
DaBigNikoladze

Reputation: 671

First of all when summarizing variables in SQL you should avoid including input variables other than the "group by" vars and the summarized ones in the final table. This to prevent a duplication of the rows.

The first SQL you wrote outputs 5 rows even if the drug/state combinations are only 4. So instead of selecting * it's better to specify the grouping variables and to use the numeric notation in the "group by" clause:

proc sql;
    create table testnew as
    select  State,
            Drug, 
            sum(Quantity) as total 
        from test
        where Year=2013
        group by 1, 2;
quit;

To have the proportions of each drug relative to the State total you can use a subquery where you compute the total by State and than directly use it in the outer query:

proc sql;
    create table testnew1 as
    select  State,
            Drug, 
            sum(Quantity) as total,
            total_by_state,
            (calculated total) / total_by_state as proportion format=percent9.2
        from (select *, 
                    sum(Quantity) as total_by_state
                from test
                where Year=2013
                group by State)
        where Year=2013
        group by 1, 2;
quit;

If you want you can then remove the where clause and include the Year variable in the group by, both in the outer and inner query.

Upvotes: 1

Related Questions