Loncar
Loncar

Reputation: 127

Group By Interpolation Based on the Previous Row

The goal is to add a new row whenever there is a gap between the date variable between two rows grouped by id. If the gap occurs, then duplicate a row that is first. However only the date feature should not be as the first row rather it should be incremented by one day.

Also, everything needs to be grouped by id. I need to achieve it without expanding the function.

data sample;
    input id date numeric_feature character_feature $;
    informat date yymmdd10.;
    datalines;

1 2020-01-01 5 A
1 2020-01-02 3 Z
1 2020-01-04 2 D
1 2020-01-05 7 B
2 2020-01-01 4 V
2 2020-01-03 1 B
2 2020-01-05 9 F
;

data sample;
    set sample;
    format date yymmdd10.;
run;

The desired result:

data sample;
    input id date numeric_feature character_feature $;
    informat date yymmdd10.;
    datalines;

1 2020-01-01 5 A
1 2020-01-02 3 Z
1 2020-01-03 3 Z
1 2020-01-04 2 D
1 2020-01-05 7 B
2 2020-01-01 4 V
2 2020-01-02 4 V
2 2020-01-03 1 B
2 2020-01-04 1 B
2 2020-01-05 9 F
;

data sample;
    set sample;
    format date yymmdd10.;
run;

Upvotes: 0

Views: 67

Answers (2)

Richard
Richard

Reputation: 27508

You can perform a 1:1 self merge with the second self starting at row 2 in order to provide a lead value. A 1:1 merge does not use a BY statement.

Example:

data have;
    input id date numeric_feature character_feature $;
    informat date yymmdd10.;
      format date yymmdd10.;
    datalines;
1 2020-01-01 5 A
1 2020-01-02 3 Z
1 2020-01-04 2 D
1 2020-01-05 7 B
2 2020-01-01 4 V
2 2020-01-03 1 B
2 2020-01-05 9 F
;

data want;
  * 1:1 merge without by statement;
  merge 
    have              /* start at row 1 */    
    have ( firstobs=2 /* start at row 2 for lead values */
           keep=id date            /* more data set options that prepare the lead */
           rename = ( id=nextid
                      date=nextdate
         )) 
  ;

  output;

  flag = '*';                  /* marker for filled in dates */
  if id = nextid then
    do date=date+1 to nextdate-1;
      output;
    end;

  drop next:;
run;

Result flagging filled in dates

enter image description here

Upvotes: 1

Tom
Tom

Reputation: 51601

To "look ahead" you can re-read the same dataset starting from the second observation. SAS will stop when you read past the end of the input so add an extra empty observation.

data sample;
  input id date numeric_feature character_feature $;
  informat date yymmdd.;
  format date yymmdd10.;
datalines;
1 2020-01-01 5 A
1 2020-01-02 3 Z
1 2020-01-04 2 D
1 2020-01-05 7 B
2 2020-01-01 4 V
2 2020-01-03 1 B
2 2020-01-05 9 F
;

data want;
  set sample;
  by id;
  set sample(firstobs=2 keep=date rename=(date=next_date)) sample(obs=1 drop=_all_);
  output;
  if not last.id then do date=date+1 to next_date-1; output; end;
run;

Results:

                           numeric_    character_
Obs    id       date        feature     feature       next_date

  1     1    2020-01-01        5           A         2020-01-02
  2     1    2020-01-02        3           Z         2020-01-04
  3     1    2020-01-03        3           Z         2020-01-04
  4     1    2020-01-04        2           D         2020-01-05
  5     1    2020-01-05        7           B         2020-01-01
  6     2    2020-01-01        4           V         2020-01-03
  7     2    2020-01-02        4           V         2020-01-03
  8     2    2020-01-03        1           B         2020-01-05
  9     2    2020-01-04        1           B         2020-01-05
 10     2    2020-01-05        9           F                  .

Upvotes: 1

Related Questions