QuantumGorilla
QuantumGorilla

Reputation: 603

Conditional IF statement with lagged values

I have the following dataset, structured as follows:

DATE           PERCENTAGE    FLAG    VALUE1  
01JAN2017        0.21          1      1.50   
04JAN2017        0.05          0      2.43
09JAN2017        0.06          1      2.21
24JAN2017        0.15          1      1.13

I have to add new variables to the dataset such that those variables will fulfill the following condition:

  1. in the case FLAG is equal to 1 and it's the first row then:
NEW_VAR_1 is equal to 500 * PERCENTAGE;
NEW_VAR_2 is equal to NEWVAR_1 * (VALUE1 - 1);
NEW_VAR_3 is equal to 500 + NEWVAR_2;
  1. In the case FLAG is equal to 1 and it's not the first row then:
NEW_VAR_1 is equal to LAG(NEWVAR_3) * PERCENTAGE;
NEW_VAR_2 is equal to NEWVAR_1 * (VALUE1 - 1);
NEW_VAR_3 is equal to LAG(NEWVAR_3) + NEWVAR_2;
  1. In the case FLAG is equal to 0 then all the NEWVAR_ values have to be set on missing.

I need to run this script on SAS and I write the following script down to do that:

DATA BACKTESTING;
    SET BACKTESTING;
    IF _N_ EQ 1 AND FLAG EQ 1 THEN DO;
        K = 500;
        NEWVAR_1 = PERCENTAGE * K;
        NEWVAR_2 = NEWVAR_1 * (VALUE_1 - 1);
        NEWVAR_3 = K + NEWVAR_2;
    END;
    ELSE IF _N_ GT 1 AND FLAG EQ 1 THEN DO;
        NEWVAR_1 = PERCENTAGE * LAG(NEWVAR_3);
        NEWVAR_2 = NEWVAR_1 * (VALUE_1 - 1);
        NEWVAR_3 = K + NEWVAR_2; 
        END;
    END;
RUN;

The script works correctly, in the sense I did not see error or warning message in the log window, but, as you can note by reading the script, it returns missing values when it finds a missing values in the lagged variables.

Is there a way to overcome such problem, in order to be able letting SAS takes the lag of NEWVAR_3 only when FLAG is equal to 1?

In the hope I've been clear enough in the question, thanks all in advance for the help!

Upvotes: 1

Views: 1090

Answers (2)

Tom
Tom

Reputation: 51621

LAG() returns the value from the previous time that it ran. If you conditionally call LAG() then it will have a spotted recorded list of values.

Your logic can be simplified quite a lot.

DATA BACKTESTING;
  SET BACKTESTING;
  k = lag(newvar_3);
  IF _N_ EQ 1 then k=500 ;
  if FLAG EQ 1 THEN DO;
    NEWVAR_1 = PERCENTAGE * K;
    NEWVAR_2 = NEWVAR_1 * (VALUE_1 - 1);
    NEWVAR_3 = K + NEWVAR_2;
  END;
RUN;

But if NEWVAR_3 is really a "new" variable then it will be missing every time lag(newvar_3) runs and your lagged values will always be missing. In that case you need to RETAIN the value from the previous observation.

DATA BACKTESTING;
  SET BACKTESTING;
  retain k 500 ;
  if FLAG EQ 1 THEN DO;
    NEWVAR_1 = PERCENTAGE * K;
    NEWVAR_2 = NEWVAR_1 * (VALUE_1 - 1);
    NEWVAR_3 = K + NEWVAR_2;
    k = newvar_3 ;
  END;
RUN;

Upvotes: 1

Allan Bowe
Allan Bowe

Reputation: 12701

The problem with lag is that it does not actually read the previous value. Instead, it adds the current value to a hidden array every time it is called - which is then retrieved in subsequent calls.

So - if you do not call lag on every iteration (or call it twice) you will get unexpected results.

One way to avoid this is to use a simple retain, eg as follows:

DATA BACKTESTING;
  SET BACKTESTING;
  IF _N_ EQ 1 AND FLAG EQ 1 THEN DO;
    K = 500;
    NEWVAR_1 = PERCENTAGE * K;
    NEWVAR_2 = NEWVAR_1 * (VALUE_1 - 1);
    NEWVAR_3 = K + NEWVAR_2;
  END;
  ELSE IF _N_ GT 1 AND FLAG EQ 1 THEN DO;
    NEWVAR_1 = PERCENTAGE * LAG_NEWVAR_3;
    NEWVAR_2 = NEWVAR_1 * (VALUE_1 - 1);
    NEWVAR_3 = K + NEWVAR_2; 
    END;
  END;
  /* create temp retain variable */
  retain LAG_NEWVAR_3 0;
  drop LAG_NEWVAR_3;
  LAG_NEWVAR_3=NEWVAR_3;
RUN;

Documentation for lag function: http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000212547.htm

Upvotes: 2

Related Questions