Reputation: 603
I have the following dataset, structured as follows:
DATE PERCENTAGE FLAG VALUE1
01JAN2017 0.21 1 1.50
04JAN2017 0.05 0 2.43
09JAN2017 0.06 1 2.21
24JAN2017 0.15 1 1.13
I have to add new variables to the dataset such that those variables will fulfill the following condition:
FLAG
is equal to 1 and it's the first row then:NEW_VAR_1 is equal to 500 * PERCENTAGE; NEW_VAR_2 is equal to NEWVAR_1 * (VALUE1 - 1); NEW_VAR_3 is equal to 500 + NEWVAR_2;
FLAG
is equal to 1 and it's not the first row then:NEW_VAR_1 is equal to LAG(NEWVAR_3) * PERCENTAGE; NEW_VAR_2 is equal to NEWVAR_1 * (VALUE1 - 1); NEW_VAR_3 is equal to LAG(NEWVAR_3) + NEWVAR_2;
FLAG
is equal to 0 then all the NEWVAR_
values have to be set on missing.I need to run this script on SAS and I write the following script down to do that:
DATA BACKTESTING;
SET BACKTESTING;
IF _N_ EQ 1 AND FLAG EQ 1 THEN DO;
K = 500;
NEWVAR_1 = PERCENTAGE * K;
NEWVAR_2 = NEWVAR_1 * (VALUE_1 - 1);
NEWVAR_3 = K + NEWVAR_2;
END;
ELSE IF _N_ GT 1 AND FLAG EQ 1 THEN DO;
NEWVAR_1 = PERCENTAGE * LAG(NEWVAR_3);
NEWVAR_2 = NEWVAR_1 * (VALUE_1 - 1);
NEWVAR_3 = K + NEWVAR_2;
END;
END;
RUN;
The script works correctly, in the sense I did not see error or warning message in the log window, but, as you can note by reading the script, it returns missing values when it finds a missing values in the lagged variables.
Is there a way to overcome such problem, in order to be able letting SAS takes the lag of NEWVAR_3
only when FLAG
is equal to 1?
In the hope I've been clear enough in the question, thanks all in advance for the help!
Upvotes: 1
Views: 1090
Reputation: 51621
LAG()
returns the value from the previous time that it ran. If you conditionally call LAG()
then it will have a spotted recorded list of values.
Your logic can be simplified quite a lot.
DATA BACKTESTING;
SET BACKTESTING;
k = lag(newvar_3);
IF _N_ EQ 1 then k=500 ;
if FLAG EQ 1 THEN DO;
NEWVAR_1 = PERCENTAGE * K;
NEWVAR_2 = NEWVAR_1 * (VALUE_1 - 1);
NEWVAR_3 = K + NEWVAR_2;
END;
RUN;
But if NEWVAR_3 is really a "new" variable then it will be missing every time lag(newvar_3)
runs and your lagged values will always be missing. In that case you need to RETAIN the value from the previous observation.
DATA BACKTESTING;
SET BACKTESTING;
retain k 500 ;
if FLAG EQ 1 THEN DO;
NEWVAR_1 = PERCENTAGE * K;
NEWVAR_2 = NEWVAR_1 * (VALUE_1 - 1);
NEWVAR_3 = K + NEWVAR_2;
k = newvar_3 ;
END;
RUN;
Upvotes: 1
Reputation: 12701
The problem with lag
is that it does not actually read the previous value. Instead, it adds the current value to a hidden array every time it is called - which is then retrieved in subsequent calls.
So - if you do not call lag
on every iteration (or call it twice) you will get unexpected results.
One way to avoid this is to use a simple retain
, eg as follows:
DATA BACKTESTING;
SET BACKTESTING;
IF _N_ EQ 1 AND FLAG EQ 1 THEN DO;
K = 500;
NEWVAR_1 = PERCENTAGE * K;
NEWVAR_2 = NEWVAR_1 * (VALUE_1 - 1);
NEWVAR_3 = K + NEWVAR_2;
END;
ELSE IF _N_ GT 1 AND FLAG EQ 1 THEN DO;
NEWVAR_1 = PERCENTAGE * LAG_NEWVAR_3;
NEWVAR_2 = NEWVAR_1 * (VALUE_1 - 1);
NEWVAR_3 = K + NEWVAR_2;
END;
END;
/* create temp retain variable */
retain LAG_NEWVAR_3 0;
drop LAG_NEWVAR_3;
LAG_NEWVAR_3=NEWVAR_3;
RUN;
Documentation for lag function: http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000212547.htm
Upvotes: 2