Reputation: 873
I have multiple dates per IID
and I'm looking to find the max (maybe min down the road) within the group only if another variable is missing. Take the following test data:
data test;
informat EFF_DATE END_DATE date9.;
length IID $12 EFF_DATE END_DATE 8;
input IID $ EFF_DATE END_DATE;
format EFF_DATE END_DATE date9.;
datalines;
A 11MAY15 .
A 12SEP13 01JAN15
A 13DEC12 .
B 25AUG14 .
B 13JUN17 01JAN15
B 03NOV13 .
;
run;
My plan is to build off the following code to calculate the max EFF_DATE
by IID
, but only if END_DATE
is missing which I cannot figure out how to add this logic in sql
.
proc sql noprint;
create table test_fill as
select *,
max(EFF_DATE) /* add something here? */ as MOST_RECENT_EFF_DATE format=date9.
from test group by IID;
quit;
The following shows the desired output:
IID EFF_DATE END_DATE MOST_RECENT_EFF_DATE
A 11MAY15 . 11MAY15
A 12SEP13 01JAN15 11MAY15
A 13DEC12 . 11MAY15
B 25AUG14 . 25AUG14
B 13JUN17 01JAN15 25AUG14
B 03NOV13 . 25AUG14
For IID=B, even though 13JUN17 is the max date we cannot accept it because END_DATE is non-missing so I need to take the next highest date. I'm sure I could write a bunch of data steps and merge back but I'd like to try to knock it out in sql
if possible.
Upvotes: 1
Views: 2598
Reputation: 63424
You include the condition as a case
inside the max
function, with else
being null (which is not max
ever, nor is it min
).
proc sql noprint;
create table test_fill as
select *,
max(case when missing(end_Date) then EFF_DATE else . end) as MOST_RECENT_EFF_DATE format=date9.
from test group by IID;
quit;
Upvotes: 4