Neoleogeo
Neoleogeo

Reputation: 323

Cummulative sum up to a date

I have a table with 4 columns. that looks something like this.

 +-------------------------------------------------------------+
 |         ID      |      date |  a1      |         a2         |
 +-------------------------------------------------------------+
 |1                | 31AUG2015 | 100      |        70
 +-------------------------------------------------------------+
 |1                | 01SEPT2015| 150      |        80
 +-------------------------------------------------------------+
 |2                | 31AUG2015 | 900      |        0
 +-------------------------------------------------------------+
 |2                | 01SEPT2015| 150      |        100
 +-------------------------------------------------------------+

I want to add a1 and a2 for all rows before and up to that date, so I have something more like this:

 +-------------------------------------------------------------+
 |         ID      |      date |  a1      |         a2         |
 +-------------------------------------------------------------+
 |1                | 31AUG2015 | 100      |        70
 +-------------------------------------------------------------+
 |1                | 01SEPT2015| 250      |        150
 +-------------------------------------------------------------+
 |2                | 31AUG2015 | 900      |        0
 +-------------------------------------------------------------+
 |2                | 01SEPT2015| 1050     |        100
 +-------------------------------------------------------------+

This was my attempt. A self join up to that date:

proc sql;
create table want as
select
    a.id
    ,a.date
    ,sum(a.a1)
    ,sum(a.a2)
from 
    have a,
    have b
where 
    a.id = b.id and
    a.dt <=  b.dt

group by
    a.id
    ,a.date

quit;

The results are a mess and nothing lines up with my expectations. I am sure I am making a big mistake somewhere, but I would appreciate some guidance with either fixes to the proc sql or a data step.

Upvotes: 0

Views: 56

Answers (3)

Reeza
Reeza

Reputation: 21294

Data steps are much easier for this type of logic. This would create the value in a new variable a3 - that gets renamed, currently commented out so you can see the logic and verify it.

data want /*(rename= (a3=a2)) */;
   set have;
   by ID date; *assumes correct ordering of data;
   if first.id then a3 = a2;
   else a3 + a2;
   *drop a2;
run;

First.ID resets the cumulative total, otherwise it continues to add using a3 + a2. This is known as a SUM statement which has an implicit RETAIN for the variable a3 which means the value is kept across rows.

Upvotes: 3

Stu Sztukowski
Stu Sztukowski

Reputation: 12934

A data step will make it easy. Sort it by your id and date, then use a sum statement to accumulate your values.

proc sort data=have;
     by id date;
run;

data want;
   set have;
   by id date;

   /* Reset cumulative sum at the start of each ID */
   if(first.id) then call missing(a1_cume, a2_cume);

   a1_cume+a1;
   a2_cume+a2;
run;

Upvotes: 3

Gordon Linoff
Gordon Linoff

Reputation: 1271151

One method is a correlated subquery:

proc sql;
    select h.*,
           (select sum(h2.a1)
            from have h2
            where h2.id = h.id and h2.date <= h.date
           ) as running_a1,
           (select sum(h2.a2)
            from have h2
            where h2.id = h.id and h2.date <= h.date
           ) as running_a2
    from h2;

That said, if you are using pass-thru SQL, you should use window functions:

sum(a1) over (partition by id order by date)

If you are using SQL, you should probably use a data step with retain.

Upvotes: 0

Related Questions