itzy
itzy

Reputation: 11755

Multiple hash objects in SAS

I have two SAS data sets. The first is relatively small, and contains unique dates and a corresponding ID:

date   dateID
1jan90     10
2jan90     15
3jan90     20
...

The second data set very large, and has two date variables:

dt1     dt2
1jan90  2jan90
3jan90  1jan90
...

I need to match both dt1 and dt2 to dateID, so the output would be:

id1  id2
10   15
20   10

Efficiency is very important here. I know how to use a hash object to do one match, so I could do one data step to do the match for dt1 and then another step for dt2, but I'd like to do both in one data step. How can this be done?

Here's how I would do the match for just dt1:

data tbl3;
 if 0 then set tbl1 tbl2;

 if _n_=1 then do;
  declare hash dts(dataset:'work.tbl2');
  dts.DefineKey('date');
  dts.DefineData('dateid');
  dts.DefineDone();
 end;

 set tbl1;
 if dts.find(key:date)=0 then output;
 run;

Upvotes: 5

Views: 2113

Answers (3)

Chris J
Chris J

Reputation: 7769

A format would probably work just as efficiently given the size of your hash table...

data fmt ;
retain fmtname 'DTID' type 'N' ;
set tbl1 ;
start = date ;
label = dateid ;
run ;
proc format cntlin=fmt ; run ;

data tbl3 ;
  set tbl2 ;
  id1 = put(dt1,DTID.) ;
  id2 = put(dt2,DTID.) ;
run ;

Edited version based on below comments...

data fmt ;
retain fmtname 'DTID' type 'I' ;
set tbl1 end=eof ;
start = date ;
label = dateid ;
output ;
if eof then do ;
  hlo = 'O' ;
  label = . ;
  output ;
end ;
run ;
proc format cntlin=fmt ; run ;

data tbl3 ;
  set tbl2 ;
  id1 = input(dt1,DTID.) ;
  id2 = input(dt2,DTID.) ;
run ;

Upvotes: 6

Joe
Joe

Reputation: 63424

I agree with the format solution, for one, but if you want to do the hash solution, here it goes. The basic thing here is that you define the key as the variable you're matching, not in the hash itself.

data tbl2;
informat date DATE7.;
input date   dateID;
datalines;
01jan90     10
02jan90     15
03jan90     20
;;;;
run;

data tbl1;
informat dt1 dt2 DATE7.;
input dt1     dt2;
datalines;
01jan90  02jan90
03jan90  01jan90
;;;;
run;
data tbl3;
 if 0 then set tbl1 tbl2;

 if _n_=1 then do;
  declare hash dts(dataset:'work.tbl2');
  dts.DefineKey('date');
  dts.DefineData('dateid');
  dts.DefineDone();
 end;

 set tbl1;
 rc1 = dts.find(key:dt1);
 if rc1=0 then id1=dateID;
 rc2 = dts.find(key:dt2);
 if rc2=0 then id2=dateID;
 if rc1=0 and rc2=0 then output;
 run;

Upvotes: 2

Robert Penridge
Robert Penridge

Reputation: 8513

I don't have SAS in front of me right now to test it but the code would look like this:

 data tbl3;
   if 0 then set tbl1 tbl2;

   if _n_=1 then do;
     declare hash dts(dataset:'work.tbl2');
     dts.DefineKey('date');
     dts.DefineData('dateid');
     dts.DefineDone();
   end;

   set tbl1;

   date = dt1;
   if dts.find()=0 then do;
     id1 = dateId;
   end;

   date = dt2;
   if dts.find()=0 then do;
     id2 = dateId;
   end;

   if dt1 or dt2 then do output; * KEEP ONLY RECORDS THAT MATCHED AT LEAST ONE;

   drop date dateId;
 run;

Upvotes: 2

Related Questions