Reputation: 127
The goal is to add a new row whenever there is a gap between the date variable between two rows grouped by id. If the gap occurs, then duplicate a row that is first. However only the date feature should not be as the first row rather it should be incremented by one day.
Also, everything needs to be grouped by id. I need to achieve it without expanding the function.
data sample;
input id date numeric_feature character_feature $;
informat date yymmdd10.;
datalines;
1 2020-01-01 5 A
1 2020-01-02 3 Z
1 2020-01-04 2 D
1 2020-01-05 7 B
2 2020-01-01 4 V
2 2020-01-03 1 B
2 2020-01-05 9 F
;
data sample;
set sample;
format date yymmdd10.;
run;
The desired result:
data sample;
input id date numeric_feature character_feature $;
informat date yymmdd10.;
datalines;
1 2020-01-01 5 A
1 2020-01-02 3 Z
1 2020-01-03 3 Z
1 2020-01-04 2 D
1 2020-01-05 7 B
2 2020-01-01 4 V
2 2020-01-02 4 V
2 2020-01-03 1 B
2 2020-01-04 1 B
2 2020-01-05 9 F
;
data sample;
set sample;
format date yymmdd10.;
run;
Upvotes: 0
Views: 67
Reputation: 27508
You can perform a 1:1 self merge with the second self starting at row 2 in order to provide a lead value. A 1:1 merge does not use a BY
statement.
Example:
data have;
input id date numeric_feature character_feature $;
informat date yymmdd10.;
format date yymmdd10.;
datalines;
1 2020-01-01 5 A
1 2020-01-02 3 Z
1 2020-01-04 2 D
1 2020-01-05 7 B
2 2020-01-01 4 V
2 2020-01-03 1 B
2 2020-01-05 9 F
;
data want;
* 1:1 merge without by statement;
merge
have /* start at row 1 */
have ( firstobs=2 /* start at row 2 for lead values */
keep=id date /* more data set options that prepare the lead */
rename = ( id=nextid
date=nextdate
))
;
output;
flag = '*'; /* marker for filled in dates */
if id = nextid then
do date=date+1 to nextdate-1;
output;
end;
drop next:;
run;
Result flagging filled in dates
Upvotes: 1
Reputation: 51601
To "look ahead" you can re-read the same dataset starting from the second observation. SAS will stop when you read past the end of the input so add an extra empty observation.
data sample;
input id date numeric_feature character_feature $;
informat date yymmdd.;
format date yymmdd10.;
datalines;
1 2020-01-01 5 A
1 2020-01-02 3 Z
1 2020-01-04 2 D
1 2020-01-05 7 B
2 2020-01-01 4 V
2 2020-01-03 1 B
2 2020-01-05 9 F
;
data want;
set sample;
by id;
set sample(firstobs=2 keep=date rename=(date=next_date)) sample(obs=1 drop=_all_);
output;
if not last.id then do date=date+1 to next_date-1; output; end;
run;
Results:
numeric_ character_
Obs id date feature feature next_date
1 1 2020-01-01 5 A 2020-01-02
2 1 2020-01-02 3 Z 2020-01-04
3 1 2020-01-03 3 Z 2020-01-04
4 1 2020-01-04 2 D 2020-01-05
5 1 2020-01-05 7 B 2020-01-01
6 2 2020-01-01 4 V 2020-01-03
7 2 2020-01-02 4 V 2020-01-03
8 2 2020-01-03 1 B 2020-01-05
9 2 2020-01-04 1 B 2020-01-05
10 2 2020-01-05 9 F .
Upvotes: 1