rahul gupta
rahul gupta

Reputation: 1

Reducing database with minimum entries

I have a dataset i.e. -

Coverage_Start  Termination_Date    Member_Id
24-Jul-19       1-Jun-21            42968701
24-Jul-19       1-Mar-21            42968701
29-Feb-20       1-Mar-20            42968701
16-Feb-19       1-Mar-19            42968701
1-Mar-17        1-Mar-18            42968701
1-Mar-16        1-Mar-17            42968701
1-Dec-15        31-Dec-16           42968701

I want to reduce this dataset, suppose in last three rows minimum coverage_start- 1-Dec-15 and maximum termination_date- 1-Mar-18, so I want to combine all three bottom rows because it has continuous coverage. As result the bottom three rows will be reduced to "1-Dec-15 1-Mar-18 42968701".

Reduced Dataset should be like -

Coverage_Start  Termination_Date    Member_Id
24-Jun-19       1-Jun-21            42968701
16-Feb-19       1-Mar-19            42968701
1-Dec-15        1-Mar-18            42968701

I want to achieve this task using SAS programming. Can anyone please help me with this? I'm trying this since a very log time but couldn't achieve it.

Upvotes: 0

Views: 64

Answers (2)

Dirk Horsten
Dirk Horsten

Reputation: 3845

data want (keep=Member_Id Coverage_Start Termination_Date);
    merge have (rename=(Termination_Date=Expiration_Date))
          have (firstobs=2 keep=Member_Id Termination_Date 
                rename=(Member_Id = Next_Id Termination_Date=Expiration_Next));

    format Termination_Date date9.; 
    retain Termination_Date;
    if Member_Id ne lag(Member_Id) 
    or lag(Coverage_Start) gt Expiration_Date + 1
    then Termination_Date = Expiration_Date;

    if_ = Next_Id ne Member_Id 
    or Expiration_Next lt Coverage_Start - 1;
run;

Upvotes: 0

Serge Kashlik
Serge Kashlik

Reputation: 413

Try this:


Data have;
infile datalines delimiter=',';
input Coverage_Start date9. Termination_Date date9. Member_Id $;
format Coverage_Start Termination_Date ddmmyyp10.;
datalines;   
24-Jul-19,1-Jun-21,42968701
24-Jul-19,1-Mar-21,42968701
29-Feb-20,1-Mar-20,42968701
16-Feb-19,1-Mar-19,42968701
1-Mar-17,1-Mar-18,42968701
1-Mar-16,1-Mar-17,42968701
1-Dec-15,31-Dec-16,42968701
;
Run;

Proc sort data=have;
By member_id Coverage_Start Termination_Date;
Run;

Data _temp;
Set have;
By member_id;
Retain max_term_date;
If first.member_id then do;
    count_id = 1;
    max_term_date = .;
End;
Else if Coverage_Start > max_term_date then count_id + 1;
max_term_date = max(Termination_Date,max_term_date);
Run;

Proc sql;
Create table want(drop=count_id) as
Select member_id,count_id
      ,min(Coverage_Start) as Coverage_Start format=date9.
      ,max(Termination_Date) as Termination_Date format=date9.
From _temp
Group by 1,2;
Quit;

Upvotes: 0

Related Questions