Reputation: 13
I'm having an issue building a code with several IF THEN DO statements and also with the LAG function. I'm using the SAS Enterprise Guide tool and coding in SQL.
The goal of the code is to classify deposits made in ATM's as 'logged deposits', 'deposits in it's own account' and 'non-logged deposits'
The classification occurs in the DETALHE column.
When CHAVE = lag(CHAVE)
and we have a deposit after a login (DES_TRANS column), we have a 'deposit in it's own account'.
When CHAVE
not equal lag(CHAVE)
we need to take a look at the column 'TEMPO' which means TIME. If TIME it's smaller than 200 seconds (means timeout), we have a 'logged deposit'. If it's bigger than 200 seconds, a 'non-logged deposit.'
The issue is when we have successive deposits and we need to use the value that was classified by the firsts IF-THEN statements (I'm using LAG(DETALHE)
for that).
First, the code wasn't classifying all lines. Lines 287 and 288 for example.
I solved that with a RETAIN statement. Now, it's just not giving the correct output
I expect the output of line 490 to be 'Depósito Não-Logado'.
FORMAT DETALHE $CHAR50.;
RETAIN DETALHE;
IF DES_TRANS ne 'Depósito' THEN DO DETALHE = ' '; END;
IF (DES_TRANS = 'Depósito' AND lag(DES_TRANS) = 'Login' AND CHAVE = lag(CHAVE)) THEN DO DETALHE = 'Depósito na Própria Conta';END;
IF (DES_TRANS = 'Depósito' AND lag(DES_TRANS) = 'Login' AND CHAVE ne lag(CHAVE) AND TEMPO <=200) THEN DO DETALHE = 'Depósito Logado';END;
IF (DES_TRANS = 'Depósito' AND lag(DES_TRANS) = 'Login' AND CHAVE ne lag(CHAVE) AND TEMPO >=200) THEN DO DETALHE = 'Depósito Não-Logado';END;
IF (DES_TRANS = 'Depósito' AND lag(DES_TRANS) = 'Depósito' AND lag(DETALHE) = 'Depósito Não-Logado') THEN DO DETALHE = 'Depósito Não-Logado';END;
IF (DES_TRANS = 'Depósito' AND lag(DES_TRANS) = 'Depósito' AND lag(DETALHE) = 'Depósito na Própria Conta' AND CHAVE = lag(CHAVE)) THEN DO DETALHE = 'Depósito na Própria Conta';END;
IF (DES_TRANS = 'Depósito' AND lag(DES_TRANS) = 'Depósito' AND lag(DETALHE) = 'Depósito na Própria Conta' AND CHAVE ne lag(CHAVE) AND TEMPO <=200) THEN DO DETALHE = 'Depósito Logado';END;
IF (DES_TRANS = 'Depósito' AND lag(DES_TRANS) = 'Depósito' AND lag(DETALHE) = 'Depósito na Própria Conta' AND CHAVE ne lag(CHAVE) AND TEMPO >=200) THEN DO DETALHE = 'Depósito Não-Logado';END;
IF (DES_TRANS = 'Depósito' AND lag(DES_TRANS) = 'Depósito' AND lag(DETALHE) = 'Depósito Logado' AND TEMPO <=200) THEN DO DETALHE = 'Depósito Logado';END;
IF (DES_TRANS = 'Depósito' AND lag(DES_TRANS) = 'Depósito' AND lag(DETALHE) = 'Depósito Logado' AND TEMPO >=200) THEN DO DETALHE = 'Depósito Não-Logado';END;
RUN;
Upvotes: 1
Views: 239
Reputation: 27498
The possibility of having to deal with multiple accounts in one data step is high, and thus you would also want a BY AcccountId;
statement. A LAG
statement does not 'reset' at the by-group boundary, so when there a multiple accounts, the LAG
at first record of the successive group will be looking at the last record of the prior group.
Each use of a LAG
function, internally, creates an implicit in-line
stack at the point of coding, and rarely would be used within a conditional DO
block. LAG
will only stack when flow control reaches it -- a LAG
in a if ... and ...
logic expression is OK because SAS always fully evaluates all parts of a logic expression (no short cut evaluation as in C
or other languages)
The most probable problem point is at
if … and … and lag(DETALHE) = '...' then …
I presume DETALHE
is NOT in the SET
data set, and is being computed in the data step. At the code point of DETLAHE
being lagged referenced, all the prior assignments of it have been conditional, and thus could be missing (as DETALHE
would be implicity reset to missing at the top of the data step).
For non-trivial coding involving lagged values my personal style is to lag each variable into its own work variable. The algorithm can be made easier to understand and less prone to coding incorrect state classifications when the state contributing roles are clearly manifested.
For the case of wanting a lagged value of a computed value prior to computing the value, you indeed need to RETAIN
the variable holding that prior computed value.
For classification purposes, the boundary case should have a distinct rule.
* state at 200 is fast only because fast classification done last;
if X <= 200 then state='slow';
if X >= 200 then state=`fast';
* better rule;
if X < 200 then state='slow'; * exclusively < 200;
For the case of CHAVE
being the group key (instead of presumed AccountId
), the code could be simplified some. Nested logic (instead of a series of statements) will help cover the full spate of combinatoric factors contributing to computing the DETALHE
classification - a series of disjoint IF/THEN is ok if they fully cover the combinatoric space and their logic evals do not over lap (fully distinct). It's a matter of the code expressing a balance of complexity, consistency and completeness.
…
SET …
BY AccountId;
…
prior_DES_TRANS = lag(DES_TRANS);
prior_CHAVE = lag(CHAVE);
prior_CLASSIFICATION = lag(CLASSIFICATION);
length DETALHE prior_DETALHE $30;
retain prior_DETALHE;
if first.AccountID then do; /* do not lag into prior group */
prior_DES_TRANS = ' ';
prior_CHAVE = ' ';
prior_CLASSIFICATION = ' ';
prior_DETALHE = ' ';
end;
if DES_TRANS = 'Depósito' then do;
sequenceLoginDeposit = prior_DES_TRANS = 'Login';
sequenceDepositDeposit = prior_DES_TRANS = 'Depósito';
end;
sameCHAVE = CHAVE = prior_CHAVE;
* use nested logic to prevent missing some combination of factors that
* contribute to DETALHE classification;
if sequenceLoginDeposit then do;
if sequenceLoginDeposit then do;
if sameCHAVE then
DETALHE = 'Depósito na Própria Conta';
else /* not sameCHAVE */
if TEMPO < 200 THEN
DETALHE = 'Depósito Logado';
else /* tempo >= 200 */
DETALHE = 'Depósito Não-Logado';
end;
else do;
end;
end;
else
if sequenceDepositDeposit then do;
if prior_DETALHE = 'Depósito Não-Logado' then
DETALHE = 'Depósito Não-Logado';
else
IF prior_DETALHE = 'Depósito na Própria Conta' then do;
if sameCHAVE THEN
DETALHE = 'Depósito na Própria Conta';
else /* not sameCHAVE */
IF TEMPO < 200 THEN
DETALHE = 'Depósito Logado';
ELSE /* and TEMPO >=200 */
DETALHE = 'Depósito Não-Logado';
end;
else
IF prior_DETALHE = 'Depósito Logado' then do;
/* is sameCHAVE important here ? */
if TEMPO < 200 THEN
DETALHE = 'Depósito Logado';
else
DETALHE = 'Depósito Não-Logado';
end;
ELSE
DETALHE = 'unhandled';
end;
prior_DETALHE = DETALHE;
Upvotes: 2