ZeekDSA
ZeekDSA

Reputation: 86

SAS proc SQL and Inner join - what are alternative methods

What I want to do is to find an alternative to the following code:

PROC SQL;
   CREATE TABLE XXXX AS 
   SELECT DISTINCT t2.WC, t2.CWC
      FROM YYYY t1
           INNER JOIN ZZZZ t2 ON (t1.MC = t2.WC)
;
QUIT;

Could someone please help in doing the same thing using hash or any other method?

I have the following tables:

data have01;
infile cards truncover expandtabs;
input MC $ LC $ MCC $ MCN $ TLC $ DD $ ODS_TimeStamp ODS_LUpd zTPl $ PuD $;
cards;
1853 DR14 1 Vetu SM3 . 24SEP2013:10:06:53.580 20JUL2016:12:55:39.240 0 .
1856 DR14 1 Vetu SM3 . 24SEP2013:10:06:53.580 20JUL2016:12:55:39.240 0 .
1869 DR14 1 Vetu SM3 . 24SEP2013:10:06:53.580 20JUL2016:12:55:39.240 0 .
2024 DV16 1 Vetu SM3 2008-01-31 24SEP2013:10:06:53.580 20JUL2016:12:55:39.240 47 .
2025 DV16 1 Vetu SM3 2008-01-31 24SEP2013:10:06:53.580 20JUL2016:12:55:39.240 47 .
run;

You might have to format date column in the above table.

data have02;
infile cards truncover expandtabs;
input WPMVId ToSTimeStamp TId ASN WC $ CWC $ TSide $ MNo Y X;
cards;
1 21AUG2012:17:57:39.000 20949 1 7604 HPUS230 R 1 -82140 2468
2 21AUG2012:17:57:39.000 20949 1 7604 HPUS230 R 2 -81940 2466
3 21AUG2012:17:57:39.000 20949 1 7604 HPUS230 R 3 -81739 2463
4 21AUG2012:17:57:39.000 20949 1 7604 HPUS230 R 4 -81539 2459
5 21AUG2012:17:57:39.000 20949 1 7604 HPUS230 R 5 -81339 2456
6 21AUG2012:17:57:39.000 20949 1 7604 HPUS230 R 6 -81139 2453
run;

You might have to format date column in the above table.

Please help me using some alternative to SQL code above, specifically when I have issue that my Table 2 above is almost 0.8 billion rows data and it takes hell a lot time to run SQL query as above.

Upvotes: 0

Views: 204

Answers (2)

Johannes Bleher
Johannes Bleher

Reputation: 331

You can use a hash object. This is especially nice if you have a large dataset and you don't want to sort it prior to merging.

Suppose you have to data sets Aset and Bset in your work library and you want to merge them on the ID variables IDVar1 and IDVar2 (they uniquely identify each entry in both data sets and are both defined for the two datasets). All other variable names differ in the two data sets. The resulting data set will be called 'merged'. Here is a minimal example:

data Aset;
input idvar1 idvar2 var1inA var2inA;
datalines;
1 48 5 100
1 8 6  165
2 5 7  102
2 965 8 136
3 105 9 145
4 105 10 456
3 85 12 454
;
run;

data Bset;
input idvar1 idvar2 var1inB var2inB;
datalines;
2 48 5 100
2 965 6  165
2 5 7  102
1 965 8 136
5 105 9 145
3 105 10 456
3 85 12 454
;
run;

data merged (drop=retval);
    if 0 then set Aset;
    if _N_=1 then do;
     declare hash hh(dataset:'Aset',ordered:'A');
     hh.definekey('IDVar1','IDVar2');
     hh.definedata(all:'Y');
     hh.definedone();
    end;

    do while (not done);
      set Bset end=done;
      retval = hh.find();
      if (retval=0) then output;
    end;
    stop;
    run;

ODS LISTING:

Obs.    idvar1    idvar2    var1inA   var2inA    var1inB  var2inB
 1         2        965        8        136        6        165
 2         2          5        7        102        7        102
 3         3        105        9        145       10        456
 4         3         85       12        454       12        454

UPDATE:

The following code works for the data examples provided. I changed some of the formats to fit the values and added some length statements.

data have01;
infile cards truncover expandtabs;
length ODS_TimeStamp $23. ODS_LUpd $23. DD $10.;
input MC LC $ MCC MCN $ TLC $ DD $ ODS_TimeStamp $ ODS_LUpd $ zTPl PuD $;
cards;
1853 DR14 1 Vetu SM3 . 24SEP2013:10:06:53.580 20JUL2016:12:55:39.240 0 .
1856 DR14 1 Vetu SM3 . 24SEP2013:10:06:53.580 20JUL2016:12:55:39.240 0 .
1869 DR14 1 Vetu SM3 . 24SEP2013:10:06:53.580 20JUL2016:12:55:39.240 0 .
2024 DV16 1 Vetu SM3 2008-01-31 24SEP2013:10:06:53.580 20JUL2016:12:55:39.240 47 .
2025 DV16 1 Vetu SM3 2008-01-31 24SEP2013:10:06:53.580 20JUL2016:12:55:39.240 47 .
run;

data have02;
infile cards truncover expandtabs;
length ToSTimeStamp $23.;
input WPMVId ToSTimeStamp $ TId ASN WC CWC $ TSide $ MNo Y X;
cards;
1 21AUG2012:17:57:39.000 20949 1 2024 HPUS230 R 1 -82140 2468
2 21AUG2012:17:57:39.000 20949 1 7604 HPUS230 R 2 -81940 2466
3 21AUG2012:17:57:39.000 20949 1 7604 HPUS230 R 3 -81739 2463
4 21AUG2012:17:57:39.000 20949 1 2024 HPUS230 R 4 -81539 2459
5 21AUG2012:17:57:39.000 20949 1 7604 HPUS230 R 5 -81339 2456
6 21AUG2012:17:57:39.000 20949 1 7604 HPUS230 R 6 -81139 2453
run;


data merged (drop=retval);
    if 0 then set have01;
    if _N_=1 then do;
     declare hash hh(dataset:'have01',ordered:'A');
     hh.definekey('MC');
     hh.definedata(all:'Y');
     hh.definedone();
    end;

    do while (not done);
      set have02 (rename=(WC=MC)) end=done;
      retval = hh.find();
      if (retval=0) then output;
    end;
    stop;
    run;

Upvotes: 2

ZeekDSA
ZeekDSA

Reputation: 86

Anything better than this answer as below...

data work.xx;
merge
  work.yy (in=a keep=mc rename=(mc=wc))
  work.zz (in=b keep=wc cwc)
;
by wc;
if a and b;
run;

proc sort data=work.xx nodupkey;
by wc cwc;
run;

Upvotes: 0

Related Questions