amy
amy

Reputation: 11

SAS Retain statement - how to retain previous non missing values into new column for comparison

I am simply looking to create a new column retaining the value of a specific column within the previous record, so that I can compare the existing column against the new column. Further down the line I want to be able to output records whereby the values in both columns are different and lose the records where the values are the same

Essentially I want my dataset to look like this, where the first idr has the retained date set to null:

Idr Date1           Date2
1   20/01/2016  .
1   20/01/2016  20/01/2016
1   18/10/2016  20/01/2016
2   07/03/2016  .
2   18/05/2016  07/03/2016
2   21/10/2016  18/05/2016
3   29/01/2016  . 
3   04/02/2016  29/01/2016
3   04/02/2016  04/02/2016

I have used code along the following lines in the past whereby I have created a temporary variable referencing the data I want to retain:

date_temp=date1;
    data example2;
    set example1;
    by idr date1;
    date_temp=date1;
retain date_temp ;
    if first.idr then do;
        date_temp=date1;
    end;else do;
        date2=date_temp;
    end;
 run;

I have searched the highs and lows of google - Any help would be greatly appreciated

Upvotes: 0

Views: 8695

Answers (2)

user667489
user667489

Reputation: 9569

The trick here is to set the value of the retained variable ready for the next row after you've already output the current row, rather than relying on the default implicit output at the end of the data step:

data example2;
    set example1;
    by idr;
    retain date2;
    if first.idr then call missing(date2);
    output;
    date2 = date1;
    format date2 ddmmyy10.;
 run;

Logic that executes after the output statement doesn't make any difference to the row that's just been output, but until the data step proceeds to the next iteration, everything is still in the PDV, including variables that are being retained across to the next row, so this is an opportunity to update them. More details on the PDV.

Another way of doing this, using the lag function:

data example3;
    set example1;
    date2 = lag(date1);
    if idr ne lag(idr) then call missing(date2);
run;

Be careful when using lag - it returns the value from the last time that instance of the lag function was executed during your data step, not necessarily the value of that variable from the previous row, so weird things tend to happen if you do something like if condition then mylaggedvar=lag(var);

Upvotes: 2

Longfish
Longfish

Reputation: 7602

To achieve your final outcome (remove records where the idr and date are the same as the previous row), you can easily achieve this without creating the extra column. Providing the data is sorted by idr and date1, then just use first.date1 to keep the required records.

data have;
input Idr Date1 :ddmmyy10.;
format date1 ddmmyy10.;
datalines;
1   20/01/2016
1   20/01/2016
1   18/10/2016
2   07/03/2016
2   18/05/2016
2   21/10/2016
3   29/01/2016
3   04/02/2016
3   04/02/2016
;
run;

data want;
set have;
by idr date1;
if first.date1 then output;
run;

Upvotes: 0

Related Questions