Reputation: 11755
I have two SAS data sets. The first is relatively small, and contains unique dates and a corresponding ID:
date dateID
1jan90 10
2jan90 15
3jan90 20
...
The second data set very large, and has two date variables:
dt1 dt2
1jan90 2jan90
3jan90 1jan90
...
I need to match both dt1
and dt2
to dateID
, so the output would be:
id1 id2
10 15
20 10
Efficiency is very important here. I know how to use a hash object to do one match, so I could do one data step to do the match for dt1
and then another step for dt2
, but I'd like to do both in one data step. How can this be done?
Here's how I would do the match for just dt1
:
data tbl3;
if 0 then set tbl1 tbl2;
if _n_=1 then do;
declare hash dts(dataset:'work.tbl2');
dts.DefineKey('date');
dts.DefineData('dateid');
dts.DefineDone();
end;
set tbl1;
if dts.find(key:date)=0 then output;
run;
Upvotes: 5
Views: 2113
Reputation: 7769
A format would probably work just as efficiently given the size of your hash table...
data fmt ;
retain fmtname 'DTID' type 'N' ;
set tbl1 ;
start = date ;
label = dateid ;
run ;
proc format cntlin=fmt ; run ;
data tbl3 ;
set tbl2 ;
id1 = put(dt1,DTID.) ;
id2 = put(dt2,DTID.) ;
run ;
Edited version based on below comments...
data fmt ;
retain fmtname 'DTID' type 'I' ;
set tbl1 end=eof ;
start = date ;
label = dateid ;
output ;
if eof then do ;
hlo = 'O' ;
label = . ;
output ;
end ;
run ;
proc format cntlin=fmt ; run ;
data tbl3 ;
set tbl2 ;
id1 = input(dt1,DTID.) ;
id2 = input(dt2,DTID.) ;
run ;
Upvotes: 6
Reputation: 63424
I agree with the format solution, for one, but if you want to do the hash solution, here it goes. The basic thing here is that you define the key as the variable you're matching, not in the hash itself.
data tbl2;
informat date DATE7.;
input date dateID;
datalines;
01jan90 10
02jan90 15
03jan90 20
;;;;
run;
data tbl1;
informat dt1 dt2 DATE7.;
input dt1 dt2;
datalines;
01jan90 02jan90
03jan90 01jan90
;;;;
run;
data tbl3;
if 0 then set tbl1 tbl2;
if _n_=1 then do;
declare hash dts(dataset:'work.tbl2');
dts.DefineKey('date');
dts.DefineData('dateid');
dts.DefineDone();
end;
set tbl1;
rc1 = dts.find(key:dt1);
if rc1=0 then id1=dateID;
rc2 = dts.find(key:dt2);
if rc2=0 then id2=dateID;
if rc1=0 and rc2=0 then output;
run;
Upvotes: 2
Reputation: 8513
I don't have SAS in front of me right now to test it but the code would look like this:
data tbl3;
if 0 then set tbl1 tbl2;
if _n_=1 then do;
declare hash dts(dataset:'work.tbl2');
dts.DefineKey('date');
dts.DefineData('dateid');
dts.DefineDone();
end;
set tbl1;
date = dt1;
if dts.find()=0 then do;
id1 = dateId;
end;
date = dt2;
if dts.find()=0 then do;
id2 = dateId;
end;
if dt1 or dt2 then do output; * KEEP ONLY RECORDS THAT MATCHED AT LEAST ONE;
drop date dateId;
run;
Upvotes: 2