Reputation: 323
I have a table with 4 columns. that looks something like this.
+-------------------------------------------------------------+
| ID | date | a1 | a2 |
+-------------------------------------------------------------+
|1 | 31AUG2015 | 100 | 70
+-------------------------------------------------------------+
|1 | 01SEPT2015| 150 | 80
+-------------------------------------------------------------+
|2 | 31AUG2015 | 900 | 0
+-------------------------------------------------------------+
|2 | 01SEPT2015| 150 | 100
+-------------------------------------------------------------+
I want to add a1 and a2 for all rows before and up to that date, so I have something more like this:
+-------------------------------------------------------------+
| ID | date | a1 | a2 |
+-------------------------------------------------------------+
|1 | 31AUG2015 | 100 | 70
+-------------------------------------------------------------+
|1 | 01SEPT2015| 250 | 150
+-------------------------------------------------------------+
|2 | 31AUG2015 | 900 | 0
+-------------------------------------------------------------+
|2 | 01SEPT2015| 1050 | 100
+-------------------------------------------------------------+
This was my attempt. A self join up to that date:
proc sql;
create table want as
select
a.id
,a.date
,sum(a.a1)
,sum(a.a2)
from
have a,
have b
where
a.id = b.id and
a.dt <= b.dt
group by
a.id
,a.date
quit;
The results are a mess and nothing lines up with my expectations. I am sure I am making a big mistake somewhere, but I would appreciate some guidance with either fixes to the proc sql or a data step.
Upvotes: 0
Views: 56
Reputation: 21294
Data steps are much easier for this type of logic. This would create the value in a new variable a3 - that gets renamed, currently commented out so you can see the logic and verify it.
data want /*(rename= (a3=a2)) */;
set have;
by ID date; *assumes correct ordering of data;
if first.id then a3 = a2;
else a3 + a2;
*drop a2;
run;
First.ID resets the cumulative total, otherwise it continues to add using a3 + a2. This is known as a SUM statement which has an implicit RETAIN for the variable a3 which means the value is kept across rows.
Upvotes: 3
Reputation: 12934
A data step will make it easy. Sort it by your id and date, then use a sum statement to accumulate your values.
proc sort data=have;
by id date;
run;
data want;
set have;
by id date;
/* Reset cumulative sum at the start of each ID */
if(first.id) then call missing(a1_cume, a2_cume);
a1_cume+a1;
a2_cume+a2;
run;
Upvotes: 3
Reputation: 1271151
One method is a correlated subquery:
proc sql;
select h.*,
(select sum(h2.a1)
from have h2
where h2.id = h.id and h2.date <= h.date
) as running_a1,
(select sum(h2.a2)
from have h2
where h2.id = h.id and h2.date <= h.date
) as running_a2
from h2;
That said, if you are using pass-thru SQL, you should use window functions:
sum(a1) over (partition by id order by date)
If you are using SQL, you should probably use a data step with retain
.
Upvotes: 0