Reputation: 86
What I want to do is to find an alternative to the following code:
PROC SQL;
CREATE TABLE XXXX AS
SELECT DISTINCT t2.WC, t2.CWC
FROM YYYY t1
INNER JOIN ZZZZ t2 ON (t1.MC = t2.WC)
;
QUIT;
Could someone please help in doing the same thing using hash or any other method?
I have the following tables:
data have01;
infile cards truncover expandtabs;
input MC $ LC $ MCC $ MCN $ TLC $ DD $ ODS_TimeStamp ODS_LUpd zTPl $ PuD $;
cards;
1853 DR14 1 Vetu SM3 . 24SEP2013:10:06:53.580 20JUL2016:12:55:39.240 0 .
1856 DR14 1 Vetu SM3 . 24SEP2013:10:06:53.580 20JUL2016:12:55:39.240 0 .
1869 DR14 1 Vetu SM3 . 24SEP2013:10:06:53.580 20JUL2016:12:55:39.240 0 .
2024 DV16 1 Vetu SM3 2008-01-31 24SEP2013:10:06:53.580 20JUL2016:12:55:39.240 47 .
2025 DV16 1 Vetu SM3 2008-01-31 24SEP2013:10:06:53.580 20JUL2016:12:55:39.240 47 .
run;
You might have to format date column in the above table.
data have02;
infile cards truncover expandtabs;
input WPMVId ToSTimeStamp TId ASN WC $ CWC $ TSide $ MNo Y X;
cards;
1 21AUG2012:17:57:39.000 20949 1 7604 HPUS230 R 1 -82140 2468
2 21AUG2012:17:57:39.000 20949 1 7604 HPUS230 R 2 -81940 2466
3 21AUG2012:17:57:39.000 20949 1 7604 HPUS230 R 3 -81739 2463
4 21AUG2012:17:57:39.000 20949 1 7604 HPUS230 R 4 -81539 2459
5 21AUG2012:17:57:39.000 20949 1 7604 HPUS230 R 5 -81339 2456
6 21AUG2012:17:57:39.000 20949 1 7604 HPUS230 R 6 -81139 2453
run;
You might have to format date column in the above table.
Please help me using some alternative to SQL code above, specifically when I have issue that my Table 2 above is almost 0.8 billion rows data and it takes hell a lot time to run SQL query as above.
Upvotes: 0
Views: 204
Reputation: 331
You can use a hash object. This is especially nice if you have a large dataset and you don't want to sort it prior to merging.
Suppose you have to data sets Aset and Bset in your work library and you want to merge them on the ID variables IDVar1 and IDVar2 (they uniquely identify each entry in both data sets and are both defined for the two datasets). All other variable names differ in the two data sets. The resulting data set will be called 'merged'. Here is a minimal example:
data Aset;
input idvar1 idvar2 var1inA var2inA;
datalines;
1 48 5 100
1 8 6 165
2 5 7 102
2 965 8 136
3 105 9 145
4 105 10 456
3 85 12 454
;
run;
data Bset;
input idvar1 idvar2 var1inB var2inB;
datalines;
2 48 5 100
2 965 6 165
2 5 7 102
1 965 8 136
5 105 9 145
3 105 10 456
3 85 12 454
;
run;
data merged (drop=retval);
if 0 then set Aset;
if _N_=1 then do;
declare hash hh(dataset:'Aset',ordered:'A');
hh.definekey('IDVar1','IDVar2');
hh.definedata(all:'Y');
hh.definedone();
end;
do while (not done);
set Bset end=done;
retval = hh.find();
if (retval=0) then output;
end;
stop;
run;
ODS LISTING:
Obs. idvar1 idvar2 var1inA var2inA var1inB var2inB
1 2 965 8 136 6 165
2 2 5 7 102 7 102
3 3 105 9 145 10 456
4 3 85 12 454 12 454
The following code works for the data examples provided. I changed some of the formats to fit the values and added some length statements.
data have01;
infile cards truncover expandtabs;
length ODS_TimeStamp $23. ODS_LUpd $23. DD $10.;
input MC LC $ MCC MCN $ TLC $ DD $ ODS_TimeStamp $ ODS_LUpd $ zTPl PuD $;
cards;
1853 DR14 1 Vetu SM3 . 24SEP2013:10:06:53.580 20JUL2016:12:55:39.240 0 .
1856 DR14 1 Vetu SM3 . 24SEP2013:10:06:53.580 20JUL2016:12:55:39.240 0 .
1869 DR14 1 Vetu SM3 . 24SEP2013:10:06:53.580 20JUL2016:12:55:39.240 0 .
2024 DV16 1 Vetu SM3 2008-01-31 24SEP2013:10:06:53.580 20JUL2016:12:55:39.240 47 .
2025 DV16 1 Vetu SM3 2008-01-31 24SEP2013:10:06:53.580 20JUL2016:12:55:39.240 47 .
run;
data have02;
infile cards truncover expandtabs;
length ToSTimeStamp $23.;
input WPMVId ToSTimeStamp $ TId ASN WC CWC $ TSide $ MNo Y X;
cards;
1 21AUG2012:17:57:39.000 20949 1 2024 HPUS230 R 1 -82140 2468
2 21AUG2012:17:57:39.000 20949 1 7604 HPUS230 R 2 -81940 2466
3 21AUG2012:17:57:39.000 20949 1 7604 HPUS230 R 3 -81739 2463
4 21AUG2012:17:57:39.000 20949 1 2024 HPUS230 R 4 -81539 2459
5 21AUG2012:17:57:39.000 20949 1 7604 HPUS230 R 5 -81339 2456
6 21AUG2012:17:57:39.000 20949 1 7604 HPUS230 R 6 -81139 2453
run;
data merged (drop=retval);
if 0 then set have01;
if _N_=1 then do;
declare hash hh(dataset:'have01',ordered:'A');
hh.definekey('MC');
hh.definedata(all:'Y');
hh.definedone();
end;
do while (not done);
set have02 (rename=(WC=MC)) end=done;
retval = hh.find();
if (retval=0) then output;
end;
stop;
run;
Upvotes: 2
Reputation: 86
Anything better than this answer as below...
data work.xx;
merge
work.yy (in=a keep=mc rename=(mc=wc))
work.zz (in=b keep=wc cwc)
;
by wc;
if a and b;
run;
proc sort data=work.xx nodupkey;
by wc cwc;
run;
Upvotes: 0