Reputation: 145
I have two datasets need to merge.
The first one is a big dataset with studyid and discharg(the date when the patients got their discharge).
The second one is the fewer observations than first one. They have two columns: studyid and call_mad(the date when nurse call the patient after discharge date). Not all discharges get a call from nurse.
The first table is
STUDYID DISCHARG
10011 2008-10-29
10011 2008-11-7
10011 2008-11-18
10011 2009-10-17
10011 2010-1-2
10011 2010-1-22
The second table is
STUDYID CALL_MAD
10011 2009-10-19
10011 2010-1-25
The final table I want
STUDYID DISCHARG CALL_MAD
10011 2008-10-29
10011 2008-11-7
10011 2008-11-18
10011 2009-10-17 2009-10-19
10011 2010-1-2
10011 2010-1-22 2010-1-25
Hopefully, it is clear. Thanks in advance.
Jane
Upvotes: 2
Views: 278
Reputation: 7602
I had the same idea as thelatemail, i.e. you first extract the latest DISCHARG date that is < (or possibly <=) each CALL_MAD date, then merge that data back to the original dataset. I think that is the best that can be done with the data structured as it is, although there is potential for this logic to break down (e.g. if the nurse's call didn't relate to the latest discharge). Ideally you would want to add the DISCHARG date column to the second table as a secondary key, so that it would be easy to join on STUDYID and DISCHARG date without making any assumptions.
Anyway, here the code I used.
data ds1;
input STUDYID DISCHARG :yymmdd10.;
format DISCHARG yymmdd10.;
datalines;
10011 2008-10-29
10011 2008-11-7
10011 2008-11-18
10011 2009-10-17
10011 2010-1-2
10011 2010-1-22
;
run;
data ds2;
input STUDYID CALL_MAD :yymmdd10.;
format CALL_MAD yymmdd10.;
datalines;
10011 2009-10-19
10011 2010-1-25
;
run;
proc sql;
create table ds3 as select
ds1.*,
ds2.call_mad
from ds1 inner join ds2 on ds1.studyid=ds2.studyid and ds2.call_mad>ds1.discharg
group by ds1.studyid,ds2.call_mad
having ds1.discharg=max(ds1.discharg);
create table want as select
ds1.*,
ds3.call_mad
from ds1 left join ds3 on ds1.studyid=ds3.studyid and ds1.discharg=ds3.discharg;
quit;
Upvotes: 1
Reputation: 93843
I haven't tested this a lot, but I think it should work ok. I am no sas expert.
It assumes your two data sets are named 'first' and 'second':
/* set unique identifiers */
data first;
set first;
keyida = _N_;
run;
data second;
set second;
keyidb = _N_;
run;
Intermediate look at the data:
First looks like:
study discharg keyida
10011 10/29/2008 1
10011 11/07/2008 2
10011 11/18/2008 3
10011 10/17/2009 4
10011 01/02/2010 5
10011 01/22/2010 6
Second looks like:
study discharg keyidb
10011 10/19/2009 1
10011 01/25/2010 2
Now merge and clean-up:
/* merge the data on id and call made after discharge */
proc sql;
create table final as
select a.studyid,
a.discharg,
a.keyida,
b.studyid,
b.call_mad,
b.keyidb
from first a,
second b
where a.studyid = b.studyid and
b.call_mad > a.discharg
order by keyida;
quit;
/* remove duplicates of original dataset */
proc sort data=final nodupkey;
by keyida;
run;
/* sort by secondary id and date of calling */
proc sort data=final;
by call_mad
keyidb;
run;
/* remove duplicates of the secondary dataset */
data final (drop = keyida keyidb);
set final;
by call_mad
keyidb;
if not (last.call_mad and last.keyidb) then call_mad =.;
run;
Final dataset looks like:
study discharg call_mad
10011 10/29/2008 .
10011 11/07/2008 .
10011 10/17/2009 10/19/2009
10011 11/18/2008 .
10011 01/02/2010 .
10011 01/22/2010 01/25/2010
Upvotes: 2