Reputation: 175
I have a SAS dataset with STS and CHGDATE. I would like to create a New Date
variable NEWCHGDATE
while retaining the same Date until the same STS=03
continues.
ID CHGDATE STS
101 11/22/2022 03
101 11/23/2022 03
101 11/24/2022 03
101 03/08/2023 04
102 01/11/2022 03
102 01/12/2022 05
102 02/12/2023 03
102 02/15/2023 04
WANT
ID CHGDATE STS NEWCHGDATE
101 11/22/2022 03 11/22/2022
101 11/23/2022 03 11/22/2022
101 11/24/2022 03 11/22/2022
101 03/08/2023 04 03/08/2023
102 01/11/2022 03 01/11/2022
102 01/12/2022 05 01/12/2022
102 02/12/2023 03 02/12/2023
102 02/15/2023 04 02/15/2023
For e.g In the above, ID 101 has STS=03
from 11/22 until 11/24 so I would like to retain the first Date 11/22/2022 until it reached STS=04
on 03/08/2023.
Here is my Query which is not giving me the expected result
data want;
set have;
by id;
if first.id then do;
lag_sts = '';
newchgdate = chgdate;
end;
else lag_sts = sts;
if sts not in ('03') then newchgdate = chgdate;
else newchgdate = ifn(missing(lag_sts),., lag(newchgdate));
run;
Upvotes: 0
Views: 53
Reputation: 51566
That is just normal BY group processing. But you will need to use the NOTSORTED keyword on the BY statement since the values of STS are not sorted. You just need to RETAIN the new variable and set it when you start a new ID or start a new STS that has value of '03'.
data have;
input ID $ CHGDATE :yymmdd. STS $;
format chgdate yymmdd10.;
cards;
101 2022-11-22 03
101 2022-11-23 03
101 2022-11-24 03
101 2023-03-08 04
102 2022-01-11 03
102 2022-01-12 05
102 2023-02-12 03
102 2023-02-15 04
;
data want;
set have;
by id sts notsorted;
if first.id or (first.sts and sts='03') then newchgdate=chgdate;
retain newchgdate;
format newchgdate yymmdd10.;
run;
Result
Obs ID CHGDATE STS newchgdate
1 101 2022-11-22 03 2022-11-22
2 101 2022-11-23 03 2022-11-22
3 101 2022-11-24 03 2022-11-22
4 101 2023-03-08 04 2022-11-22
5 102 2022-01-11 03 2022-01-11
6 102 2022-01-12 05 2022-01-11
7 102 2023-02-12 03 2023-02-12
8 102 2023-02-15 04 2023-02-12
Upvotes: 0
Reputation: 27498
Use a RETAIN
statement to specify which variables will not get reset to missing at the top of the DATA Step implicit loop.
Use the LAG
function to examine the prior sts
value. Be careful about using LAG inside a conditional clause (which is not happening here). In this example the LAG is occurring in the test expression.
Example:
The test expression is a little tricky if you are new to SAS and DATA Step.
data have;
attrib id length=8 CHG_date length=8 informat=mmddyy10. format=mmddyy10. sts length=$3;
input id CHG_date sts ;
datalines ;
101 11/22/2022 03
101 11/23/2022 03
101 11/24/2022 03
101 03/08/2023 04
102 01/11/2022 03
102 01/12/2022 05
102 02/12/2023 03
102 02/15/2023 04
;
data want ;
set have ;
by id ;
attrib new_CHG_date length=8 format=mmddyy10. ;
retain new_CHG_date ;
if sts ne '03' or lag(sts) ne '03' or first.id then
new_CHG_date = CHG_date ;
run ;
Upvotes: 0