Reputation: 19
I need help with a merge. I have two tables as below- Table 1
ID ID1 ID2 ID3 ID4 ID5
1005 2005 3005 4005 5005 7105
3005 4005 5005 7105
4005 5005 7105
5005 7105
2005 3005 4005 5005 7105
7105
Table 2
ID Names
1005 John
3005 Rick
4005 Sam
5005 Harry
2005 Mary
7105 Deena
I need an efficient way to merge with columns in Table 1 with Table 2. I can merge in separate datasteps but is there a way I can do it in more efficient way?
proc sql;
create merge1 as
select *
from table1 a
left join table2 b on a.id = b.id;
quit;
proc sql;
create merge2 as
select *
from merge1 a
left join table2 b on a.id = b.id;
quit;
Result I want with all columns (examplebelow):
ID NamesID ID1 NamesID1 ID2 NamesID2 ID3
1005 John 2005 Mary 3005 Rick 4005
3005 Rick 4005 Sam 5005 Harry 7105
4005 Sam 5005 Harry 7105 Deena
5005 Harry 7105 Deena
2005 Mary 3005 Rick 4005 Sam 5005
7105 Deena
Thanks!
Upvotes: 0
Views: 557
Reputation: 1319
Here is the format-based solution:
data table1;
length id id1 id2 id3 id4 id5 8;
infile datalines missover;
input id id1 id2 id3 id4 id5;
cards;
1005 2005 3005 4005 5005 7105
3005 4005 5005 7105
4005 5005 7105
5005 7105
2005 3005 4005 5005 7105
7105
;
run;
data table2;
length id 8 names $ 10;
input id names;
cards;
1005 John
3005 Rick
4005 Sam
5005 Harry
2005 Mary
7105 Deena
;
run;
* Create a CNTLIN data set defining the required format;
data fmt_in;
set table2;
fmtname = 'names';
start = id;
label = names;
run;
* Run PROC FORMAT to generate the format from the CNTLIN data set;
proc format cntlin=fmt_in;
run;
* Apply the format to the input data set;
data out;
set table1;
namesID = put(id, names.);
namesID1 = put(id1, names.);
namesID2 = put(id2, names.);
namesID3 = put(id3, names.);
namesID4 = put(id4, names.);
namesID5 = put(id5, names.);
run;
This will be very efficient for large inputs because it doesn't require multiple sorts. In general, of course, your input data set table1
should be normalised to be tall and thin so that there is only one column holding IDs; that would have made the merge-based solution trivial, though probably still slower than using a format.
Upvotes: 1