Group By Interpolation Based on the Previous Row

Question

The goal is to add a new row whenever there is a gap between the date variable between two rows grouped by id. If the gap occurs, then duplicate a row that is first. However only the date feature should not be as the first row rather it should be incremented by one day.

Also, everything needs to be grouped by id. I need to achieve it without expanding the function.

data sample;
    input id date numeric_feature character_feature $;
    informat date yymmdd10.;
    datalines;

1 2020-01-01 5 A
1 2020-01-02 3 Z
1 2020-01-04 2 D
1 2020-01-05 7 B
2 2020-01-01 4 V
2 2020-01-03 1 B
2 2020-01-05 9 F
;

data sample;
    set sample;
    format date yymmdd10.;
run;

The desired result:

data sample;
    input id date numeric_feature character_feature $;
    informat date yymmdd10.;
    datalines;

1 2020-01-01 5 A
1 2020-01-02 3 Z
1 2020-01-03 3 Z
1 2020-01-04 2 D
1 2020-01-05 7 B
2 2020-01-01 4 V
2 2020-01-02 4 V
2 2020-01-03 1 B
2 2020-01-04 1 B
2 2020-01-05 9 F
;

data sample;
    set sample;
    format date yymmdd10.;
run;

Richard · Accepted Answer

You can perform a 1:1 self merge with the second self starting at row 2 in order to provide a lead value. A 1:1 merge does not use a BY statement.

Example:

data have;
    input id date numeric_feature character_feature $;
    informat date yymmdd10.;
      format date yymmdd10.;
    datalines;
1 2020-01-01 5 A
1 2020-01-02 3 Z
1 2020-01-04 2 D
1 2020-01-05 7 B
2 2020-01-01 4 V
2 2020-01-03 1 B
2 2020-01-05 9 F
;

data want;
  * 1:1 merge without by statement;
  merge 
    have              /* start at row 1 */    
    have ( firstobs=2 /* start at row 2 for lead values */
           keep=id date            /* more data set options that prepare the lead */
           rename = ( id=nextid
                      date=nextdate
         )) 
  ;

  output;

  flag = '*';                  /* marker for filled in dates */
  if id = nextid then
    do date=date+1 to nextdate-1;
      output;
    end;

  drop next:;
run;

Result flagging filled in dates

Group By Interpolation Based on the Previous Row

Answers (2)

Related Questions