Reputation: 103
I am trying to sum up a variable using retain based on certain conditions. I have these fields account_id, date, transaction, value and transaction_type.
The data is sorted by account_id and date
I want to sum up value until the first transaction_type not in ('A')
I have this
data dset;
set dset;
by account_id;
retain sum_flag sum;
if first.account_id then do;
sum_flag = 1;
sum=0;
end;
if transaction_type not in ('A') then sum_flag = 0;
if sum_flag = 1 then sum=sum +value;
run;
My issue is if the first transaction_type in A then it works fine but if the next transaction_type is also A sum_flag is set to . instead of being kept as 1. Why is this happening?
I think it's something to do with if transaction_type not in ('A') then sum_flag = 0; Without it the variable sum_flag is not being reset to missing if the transaction is in 'A' but I only want the value retained until the first non 'A' transaction.
Upvotes: 0
Views: 349
Reputation: 8513
Two things could be happening here.
value
field may sometimes contain nulls? If so the +
operator is causing your problems. The +
operator will always return NULL
if one of it's arguments are null. Instead use the sum()
function. i.e. if sum_flag = 1 then sum=sum(sum,value);
Upvotes: 0
Reputation: 27508
Tom is probably correct that the inbound dset has extra variables from your earlier iterations at trying to solve this problem.
A DOW loop approach also works and does not require a retain or an initializing if.
data dset;
input group type $ value;
datalines;
1 A 1
1 A 2
1 A 3
1 A 4
1 B -5
1 A 8
2 A 0
2 A 9
2 A 33
3 B 9
;
run;
data dset;
sum = .; _summing=1;
do until (last.group);
set dset;
by group;
_summing = _summing and (type = 'A');
if _summing then sum + value;
output;
end;
drop _:; * dont keep state variables;
run;
Upvotes: 0