How to write proc sql without windowfunction over partition by sum?

Question

I just started learning SAS and realised that proc sql don't use window functions. As I am more at ease with sql I was wondering how I can simulate a sum window function in proc?

desired result

 select a.active, a.store_id, a.nbr, sum(nbr) over (partition by a.store_id)
from(select active, store_id, count(customer_id) as nbr from customer group by active, store_id) as a
;

active	store_id	nbr	sum
0	1	8	326
1	1	318	326
0	2	7	273
1	2	266	273

eg of raw data

select active, store_id, customer_id
from customer
limit 10;

active	store_id	customer_id
1	1	1
1	1	2
1	2	3
1	2	4
1	1	5
1	1	6
0	1	7
1	2	8
1	1	9
1	2	10

current result and query

    select a.active, a.store_id, a.nbr, sum(nbr) 
    from(select active, store_id, count(customer_id) as nbr from customer group by active, store_id) as a
    group by a.active, a.store_id, a.nbr;

active	store_id	nbr	sum
0	1	8	8
1	1	318	318
0	2	7	7
1	2	266	266

Stu Sztukowski · Accepted Answer

You can do the equivalent in proc sql by merging two sub-queries: one for the count of customers by active, store_id, and another for the total customers for each store_id.

proc sql noprint;
    create table want as
        select t1.active
             , t1.store_id
             , t1.nbr
             , t2.sum
        from (select active
                   , store_id
                   , count(customer_id) as nbr
              from have 
              group by store_id, active
             ) as t1

        LEFT JOIN

             (select store_id
                   , count(customer_id) as sum
              from have
              group by store_id
             ) as t2

        ON t1.store_id = t2.store_id
    ;
quit;

If you wanted to do this in a more SASsy way, you can run proc means and merge together the results from a single dataset that holds everything you need. proc means will calculate all possible combinations of your variables by default.

proc means data=have noprint;
    class store_id active;
    ways 1 2;

    output out=want_total
        n(customer_id) = total
    ;
run;

data want;
    merge want_total(where=(_TYPE_ = 3) rename=(total = nbr) )
          want_total(where=(_TYPE_ = 2) rename=(total = sum) keep=_TYPE_ store_id total)
    ;
    by store_id;

    drop _:;
run;

Or, in SQL:

proc sql;
    create table want as
        select t1.store_id
             , t1.active
             , t1.total as nbr
             , t2.total as sum
        from want_total as t1
        LEFT JOIN
             want_total as t2
        ON t1.store_id = t2.store_id
        where     t1._TYPE_ = 3
              AND t2._TYPE_ = 2
    ;
quit;

The _TYPE_ variable identifies the level of the analysis. For example, _TYPE_ = 1 is for active only, _TYPE_ = 2 is for store_id only, and _TYPE_ = 3 is for all combinations. You can view this in the output dataset from proc means:

store_id    active  _TYPE_  _FREQ_  total
.           0       1        3      3
.           1       1        7      7
1           .       2        6      6
2           .       2        4      4
1           0       3        1      1
1           1       3        5      5
2           0       3        2      2
2           1       3        2      2

And if you wanted faster high-performance results, check out its big sibling, proc hpsummary.

Therein lies the cool thing about SAS: You can bounce between PROCs, SQL, the DATA Step, and Python via Pandas/proc python. You can exploit the unique benefits of each of these methods and thought processes for any number of data engineering and statistics problems.

How to write proc sql without windowfunction over partition by sum?

Answers (2)

Related Questions