user1238178
user1238178

Reputation: 145

few_to_many merge?

I have two datasets need to merge.

The first one is a big dataset with studyid and discharg(the date when the patients got their discharge).

The second one is the fewer observations than first one. They have two columns: studyid and call_mad(the date when nurse call the patient after discharge date). Not all discharges get a call from nurse.

The first table is

STUDYID   DISCHARG      

10011   2008-10-29          

10011   2008-11-7           

10011   2008-11-18          

10011   2009-10-17     

10011   2010-1-2       

10011   2010-1-22  

The second table is

 STUDYID        CALL_MAD

 10011          2009-10-19
 10011          2010-1-25  

The final table I want

STUDYID   DISCHARG      CALL_MAD

10011   2008-10-29          

10011   2008-11-7           

10011   2008-11-18          

10011   2009-10-17      2009-10-19

10011   2010-1-2       

10011   2010-1-22       2010-1-25  

Hopefully, it is clear. Thanks in advance.

Jane

Upvotes: 2

Views: 278

Answers (2)

Longfish
Longfish

Reputation: 7602

I had the same idea as thelatemail, i.e. you first extract the latest DISCHARG date that is < (or possibly <=) each CALL_MAD date, then merge that data back to the original dataset. I think that is the best that can be done with the data structured as it is, although there is potential for this logic to break down (e.g. if the nurse's call didn't relate to the latest discharge). Ideally you would want to add the DISCHARG date column to the second table as a secondary key, so that it would be easy to join on STUDYID and DISCHARG date without making any assumptions.

Anyway, here the code I used.

data ds1;
input STUDYID DISCHARG :yymmdd10.;
format DISCHARG yymmdd10.;
datalines;
10011   2008-10-29
10011   2008-11-7
10011   2008-11-18
10011   2009-10-17
10011   2010-1-2
10011   2010-1-22
;
run;

data ds2;
input STUDYID CALL_MAD :yymmdd10.;
format CALL_MAD yymmdd10.;
datalines;
10011   2009-10-19
10011   2010-1-25
;
run;

proc sql;
create table ds3 as select
ds1.*,
ds2.call_mad
from ds1 inner join ds2 on ds1.studyid=ds2.studyid and ds2.call_mad>ds1.discharg
group by ds1.studyid,ds2.call_mad
having ds1.discharg=max(ds1.discharg);

create table want as select
ds1.*,
ds3.call_mad
from ds1 left join ds3 on ds1.studyid=ds3.studyid and ds1.discharg=ds3.discharg;
quit;

Upvotes: 1

thelatemail
thelatemail

Reputation: 93843

I haven't tested this a lot, but I think it should work ok. I am no sas expert.

It assumes your two data sets are named 'first' and 'second':

/* set unique identifiers */
data first;
    set first;
    keyida = _N_;
run;

data second;
    set second;
    keyidb = _N_;
run;

Intermediate look at the data:

First looks like:

study   discharg    keyida
10011   10/29/2008  1
10011   11/07/2008  2
10011   11/18/2008  3
10011   10/17/2009  4
10011   01/02/2010  5
10011   01/22/2010  6

Second looks like:

study   discharg    keyidb
10011   10/19/2009  1
10011   01/25/2010  2

Now merge and clean-up:

/* merge the data on id and call made after discharge */
proc sql;
    create table final as
    select  a.studyid,
            a.discharg,
            a.keyida,
            b.studyid,
            b.call_mad,
            b.keyidb
    from    first a,
            second b
    where   a.studyid = b.studyid and
            b.call_mad > a.discharg
    order by keyida; 
quit;

/* remove duplicates of original dataset */
proc sort data=final nodupkey;
    by  keyida;
run;

/* sort by secondary id and date of calling */
proc sort data=final;
    by  call_mad
        keyidb;
run;

/* remove duplicates of the secondary dataset */
data final (drop = keyida keyidb);
    set final;
    by  call_mad
        keyidb;
    if not (last.call_mad and last.keyidb) then call_mad =.;
run;

Final dataset looks like:

study   discharg    call_mad
10011   10/29/2008  .
10011   11/07/2008  .
10011   10/17/2009  10/19/2009
10011   11/18/2008  .
10011   01/02/2010  .
10011   01/22/2010  01/25/2010

Upvotes: 2

Related Questions