Reputation: 1
I have a dataset i.e. -
Coverage_Start Termination_Date Member_Id
24-Jul-19 1-Jun-21 42968701
24-Jul-19 1-Mar-21 42968701
29-Feb-20 1-Mar-20 42968701
16-Feb-19 1-Mar-19 42968701
1-Mar-17 1-Mar-18 42968701
1-Mar-16 1-Mar-17 42968701
1-Dec-15 31-Dec-16 42968701
I want to reduce this dataset, suppose in last three rows minimum coverage_start- 1-Dec-15 and maximum termination_date- 1-Mar-18, so I want to combine all three bottom rows because it has continuous coverage. As result the bottom three rows will be reduced to "1-Dec-15 1-Mar-18 42968701".
Reduced Dataset should be like -
Coverage_Start Termination_Date Member_Id
24-Jun-19 1-Jun-21 42968701
16-Feb-19 1-Mar-19 42968701
1-Dec-15 1-Mar-18 42968701
I want to achieve this task using SAS programming. Can anyone please help me with this? I'm trying this since a very log time but couldn't achieve it.
Upvotes: 0
Views: 64
Reputation: 3845
data want (keep=Member_Id Coverage_Start Termination_Date);
merge have (rename=(Termination_Date=Expiration_Date))
have (firstobs=2 keep=Member_Id Termination_Date
rename=(Member_Id = Next_Id Termination_Date=Expiration_Next));
format Termination_Date date9.;
retain Termination_Date;
if Member_Id ne lag(Member_Id)
or lag(Coverage_Start) gt Expiration_Date + 1
then Termination_Date = Expiration_Date;
if_ = Next_Id ne Member_Id
or Expiration_Next lt Coverage_Start - 1;
run;
Upvotes: 0
Reputation: 413
Try this:
Data have;
infile datalines delimiter=',';
input Coverage_Start date9. Termination_Date date9. Member_Id $;
format Coverage_Start Termination_Date ddmmyyp10.;
datalines;
24-Jul-19,1-Jun-21,42968701
24-Jul-19,1-Mar-21,42968701
29-Feb-20,1-Mar-20,42968701
16-Feb-19,1-Mar-19,42968701
1-Mar-17,1-Mar-18,42968701
1-Mar-16,1-Mar-17,42968701
1-Dec-15,31-Dec-16,42968701
;
Run;
Proc sort data=have;
By member_id Coverage_Start Termination_Date;
Run;
Data _temp;
Set have;
By member_id;
Retain max_term_date;
If first.member_id then do;
count_id = 1;
max_term_date = .;
End;
Else if Coverage_Start > max_term_date then count_id + 1;
max_term_date = max(Termination_Date,max_term_date);
Run;
Proc sql;
Create table want(drop=count_id) as
Select member_id,count_id
,min(Coverage_Start) as Coverage_Start format=date9.
,max(Termination_Date) as Termination_Date format=date9.
From _temp
Group by 1,2;
Quit;
Upvotes: 0