The_Doctor_28
The_Doctor_28

Reputation: 5

SAS filter dates one column if between (min, max) dates of another column

My dataset looks like below

ID Date 1 Date 2
A 01Aug2020 .
A 01Sep2020 .
A 01Oct2020 .
A 01Dec2020 .
A . 01Nov2020
A . 01Jan2021
B 01Mar2020 .
B 01Apr2020 .
B . 01Jan2020
C 01Jan2020 .
C 01Mar2020 .
C . 01Feb2020
D 01Jan2020 .
D 01Feb2020 .
D . 01Mar2020

There is never any crossover dates between dates 1 and 2, and whenever one field is populated the other is null.

I want to filter the table into different categories for each ID. The most important one is where Date2 is between some of the dates in the Date1 field (so between the min and max of Date1).

So this filters the table to ID A 01Nov2020 as this is between the min for ID A (01Aug2020) and max (01Dec2020). As well as ID C 01Feb2020, as this is between the min for ID C (01Jan2020) and max (01Mar2020).

Less importantly I then want to filter the table where date2 comes before the minimum of Date1 (for example ID B 01Jan2020). And then a final filtered table where date2 comes before the minimum of Date1 (for example ID D 01Mar2020 as well as ID A 01Jan2021).

The difficult part is that they are not aligned up in rows so have to work across multiple rows in the dataset. I believe I may need to use the Array function but I am not sure. Any help would be greatly appreciated.

Upvotes: 0

Views: 1351

Answers (1)

Tom
Tom

Reputation: 51621

PROC SQL should make this easy. It will automatically remerge the MIN() and MAX() values back to all detail rows so that you can use them in your calculations of the status of each DATE2 value.

data have;
  input id $ (date1 date2) (:date.);
  format date1 date2 date9.;
  row+1;
cards;
A 01Aug2020 .
A 01Sep2020 .
A 01Oct2020 .
A 01Dec2020 .
A .         01Nov2020
A .         01Jan2021
B 01Mar2020 .
B 01Apr2020 .
B .         01Jan2020
C 01Jan2020 .
C 01Mar2020 .
C .         01Feb2020
D 01Jan2020 .
D 01Feb2020 .
D .         01Mar2020
;

proc sql;
 create table want as
 select *
      , case when missing(date2) then 'missing'
             when (date2 between min(date1) and max(date1)) then '2 between 1'
             when (date2 < min(date1)) then '2 less than 1'
             when (date2 > max(date1)) then '2 greater than 1'
             else 'unknown'
       end as flag1

 from have
 group by id
 order by id, row
;
quit;

Results:

Obs    id        date1        date2    row    flag1

  1    A     01AUG2020            .      1    missing
  2    A     01SEP2020            .      2    missing
  3    A     01OCT2020            .      3    missing
  4    A     01DEC2020            .      4    missing
  5    A             .    01NOV2020      5    2 between 1
  6    A             .    01JAN2021      6    2 greater than 1
  7    B     01MAR2020            .      7    missing
  8    B     01APR2020            .      8    missing
  9    B             .    01JAN2020      9    2 less than 1
 10    C     01JAN2020            .     10    missing
 11    C     01MAR2020            .     11    missing
 12    C             .    01FEB2020     12    2 between 1
 13    D     01JAN2020            .     13    missing
 14    D     01FEB2020            .     14    missing
 15    D             .    01MAR2020     15    2 greater than 1

Upvotes: 0

Related Questions