Mobix
Mobix

Reputation: 103

Retained variable not being retained

I am trying to sum up a variable using retain based on certain conditions. I have these fields account_id, date, transaction, value and transaction_type.

The data is sorted by account_id and date

I want to sum up value until the first transaction_type not in ('A')

I have this

data dset; 
    set dset;
    by account_id;
    retain sum_flag sum;
    if first.account_id then do;
        sum_flag = 1;
        sum=0;
    end;
    if transaction_type not in ('A') then sum_flag = 0;
    if sum_flag = 1 then sum=sum +value;
run;

My issue is if the first transaction_type in A then it works fine but if the next transaction_type is also A sum_flag is set to . instead of being kept as 1. Why is this happening?

I think it's something to do with if transaction_type not in ('A') then sum_flag = 0; Without it the variable sum_flag is not being reset to missing if the transaction is in 'A' but I only want the value retained until the first non 'A' transaction.

Upvotes: 0

Views: 349

Answers (2)

Robert Penridge
Robert Penridge

Reputation: 8513

Two things could be happening here.

  1. You output dataset name is the same as your input dataset name (bad practice - don't do this). If you ran it once with a mistake, it may have changed the values in the table to something unexpected, which are then making the corrected code look like it's failing. To fix this simply re-run from the beginning and change the output dataset name to something else.
  2. Your value field may sometimes contain nulls? If so the + operator is causing your problems. The + operator will always return NULL if one of it's arguments are null. Instead use the sum() function. i.e. if sum_flag = 1 then sum=sum(sum,value);

Upvotes: 0

Richard
Richard

Reputation: 27508

Tom is probably correct that the inbound dset has extra variables from your earlier iterations at trying to solve this problem.

A DOW loop approach also works and does not require a retain or an initializing if.

data dset;
input group type $ value;
datalines;
1 A 1
1 A 2
1 A 3
1 A 4
1 B -5
1 A 8
2 A 0
2 A 9
2 A 33
3 B 9
;
run;

data dset;
  sum = .; _summing=1;
  do until (last.group);
    set dset;
    by group;

    _summing = _summing and (type = 'A');
    if _summing then sum + value;

    output;
  end;
  drop _:; * dont keep state variables;
run;

Upvotes: 0

Related Questions