Reputation: 5
My dataset looks like below
ID | Date 1 | Date 2 |
---|---|---|
A | 01Aug2020 | . |
A | 01Sep2020 | . |
A | 01Oct2020 | . |
A | 01Dec2020 | . |
A | . | 01Nov2020 |
A | . | 01Jan2021 |
B | 01Mar2020 | . |
B | 01Apr2020 | . |
B | . | 01Jan2020 |
C | 01Jan2020 | . |
C | 01Mar2020 | . |
C | . | 01Feb2020 |
D | 01Jan2020 | . |
D | 01Feb2020 | . |
D | . | 01Mar2020 |
There is never any crossover dates between dates 1 and 2, and whenever one field is populated the other is null.
I want to filter the table into different categories for each ID. The most important one is where Date2 is between some of the dates in the Date1 field (so between the min and max of Date1).
So this filters the table to ID A 01Nov2020 as this is between the min for ID A (01Aug2020) and max (01Dec2020). As well as ID C 01Feb2020, as this is between the min for ID C (01Jan2020) and max (01Mar2020).
Less importantly I then want to filter the table where date2 comes before the minimum of Date1 (for example ID B 01Jan2020). And then a final filtered table where date2 comes before the minimum of Date1 (for example ID D 01Mar2020 as well as ID A 01Jan2021).
The difficult part is that they are not aligned up in rows so have to work across multiple rows in the dataset. I believe I may need to use the Array function but I am not sure. Any help would be greatly appreciated.
Upvotes: 0
Views: 1351
Reputation: 51621
PROC SQL should make this easy. It will automatically remerge the MIN() and MAX() values back to all detail rows so that you can use them in your calculations of the status of each DATE2 value.
data have;
input id $ (date1 date2) (:date.);
format date1 date2 date9.;
row+1;
cards;
A 01Aug2020 .
A 01Sep2020 .
A 01Oct2020 .
A 01Dec2020 .
A . 01Nov2020
A . 01Jan2021
B 01Mar2020 .
B 01Apr2020 .
B . 01Jan2020
C 01Jan2020 .
C 01Mar2020 .
C . 01Feb2020
D 01Jan2020 .
D 01Feb2020 .
D . 01Mar2020
;
proc sql;
create table want as
select *
, case when missing(date2) then 'missing'
when (date2 between min(date1) and max(date1)) then '2 between 1'
when (date2 < min(date1)) then '2 less than 1'
when (date2 > max(date1)) then '2 greater than 1'
else 'unknown'
end as flag1
from have
group by id
order by id, row
;
quit;
Results:
Obs id date1 date2 row flag1
1 A 01AUG2020 . 1 missing
2 A 01SEP2020 . 2 missing
3 A 01OCT2020 . 3 missing
4 A 01DEC2020 . 4 missing
5 A . 01NOV2020 5 2 between 1
6 A . 01JAN2021 6 2 greater than 1
7 B 01MAR2020 . 7 missing
8 B 01APR2020 . 8 missing
9 B . 01JAN2020 9 2 less than 1
10 C 01JAN2020 . 10 missing
11 C 01MAR2020 . 11 missing
12 C . 01FEB2020 12 2 between 1
13 D 01JAN2020 . 13 missing
14 D 01FEB2020 . 14 missing
15 D . 01MAR2020 15 2 greater than 1
Upvotes: 0