Francesco Grassano
Francesco Grassano

Reputation: 33

Retrive data by row with lag function

Good morning. I've this dataset:

Appendix | Change_Serial_Number| Status     | Duration | Mileage  | Service
20101234        0                   .            60       120000       Z
20101234        1                 Proposed       48       110000       Z
20101234        2                 Activated      24        90000       Z
20101234        3                 Proposed       60       120000       Z
20101234        4                 Proposed       50       160000       B
20101234        5                 Activated      36       110000       B

Each row is a variation that could be activated or only proposed with the first row with status like blank or the previously activated variation. I need to have this table:

Appendix | Change_Serial_Number| Status     | Duration | Mileage  | Service |Duration_Prev| Mileage_prev |
20101234        0                   .            60       120000       Z        .
20101234        1                 Proposed       48       110000       Z        60              120000
20101234        2                 Activated      24        90000       Z        60              120000
20101234        3                 Proposed       60       120000       Z        24              90000
20101234        4                 Proposed       50       160000       B        24              90000
20101234        5                 Activated      36       110000       B        24              90000

I need to compare the duration, mileage and service of each variation with the previously activated or with the initial condition only if there aren't variation activated.

I tried with lag function to retrieve a data of previous row, but i need to retrieve data of 3 field and retrieve data only from the last activated variation or, if there aren't, from the initial condition.

I used this code:

proc sort data=db_rdg;
       by Appendix Change_Serial_Number descending Change_Serial_Number;
  run;

  data db_rdg2;
       set db_rdg;
  by Appendix;
  Duration_prev=lag(Duration);
  if first. Appendix then Durata_prev =.;
  run;

With this code, i can retrieve a data only from the previously row (not from the previosly actived row or from the first condition) and only for a duration variable (not at the same time for duration, mileage and service).

I hope I have been clear enough :)

Thank you for your help!

Upvotes: 0

Views: 70

Answers (2)

Chris Long
Chris Long

Reputation: 1319

The lag() function is only really useful for working with values from a specific number of observations earlier. In this case, you don't know whether the values you want to work with are from the previous observation or from five or six observations earlier, so instead of using lag(), you should RETAIN the additional variables and update their values when appropriate:

data db_rdg2;
  retain duration_prev .;
  set db_rdg;
  by Appendix;
  if first.Appendix or status = 'Activated' then duration_prev = duration;
run;

The RETAIN statement allows duration_prev to retain its value as each new observation in read from the input, instead of being reset to missing.

http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000214163.htm

Upvotes: 1

Richard
Richard

Reputation: 27508

Instead of using LAG to retrieve the duration from the prior row, you will want to store the activate state tracking variables (for duration, mileage and serial) in a variable that is retained and updated after an explicit output.

In these two sample codes I tossed in tracking serial as you may want to know # of changes from prior activate.

data have; input
Appendix   Change_Serial_Number   Status $     Duration  Mileage   Service $;
datalines;
20101234        0                   .            60       120000       Z
20101234        1                 Proposed       48       110000       Z
20101234        2                 Activated      24        90000       Z
20101234        3                 Proposed       60       120000       Z
20101234        4                 Proposed       50       160000       B
20101234        5                 Activated      36       110000       B
run;

* NOTE: _APA suffix means @ prior activate;

* version 1;
* implicit loop with by group processing means ;
* explicit first. test needed in order to reset the apa tracking variables;

data want;
  set have;
  by appendix;

  if first.appendix then do;
     length csn_apa dur_apa mil_apa 8;
     call missing(csn_apa, dur_apa, mil_apa);    
  end;

  output;

  if status in (' ' 'Activate') then do;
    csn_apa = change_serial_number;
    dur_apa = duration;
    mil_apa = mileage;
  end;

  retain csn_apa dur_apa mil_apa;
run;

* version 2;
* DOW version;
* explicit loop over group means first. handling not explicitly needed;
* implicit loop performs tracking variable resets;
* retain not needed because output and tracking variables modified;
* within current iteration of implicit loop;

data want2;
  do until (last.appendix);
    set have;
    by appendix;

    output;

    if status in (' ' 'Activate') then do;
      csn_apa = change_serial_number;
      dur_apa = duration;
      mil_apa = mileage;
    end;
  end;
run;

Upvotes: 0

Related Questions