JCP
JCP

Reputation: 19

SAS merge multiple columns with same data

I need help with a merge. I have two tables as below- Table 1

ID      ID1     ID2     ID3     ID4     ID5
1005    2005    3005    4005    5005    7105
3005    4005    5005    7105        
4005    5005    7105            
5005    7105                
2005    3005    4005    5005    7105    
7105                    

Table 2

ID      Names
1005    John
3005    Rick
4005    Sam
5005    Harry
2005    Mary
7105    Deena

I need an efficient way to merge with columns in Table 1 with Table 2. I can merge in separate datasteps but is there a way I can do it in more efficient way?

proc sql;
create merge1 as 
select *
from table1 a
left join table2 b on a.id = b.id;
quit;

proc sql;
create merge2 as 
select *
from merge1 a
left join table2 b on a.id = b.id;
quit;

Result I want with all columns (examplebelow):

ID    NamesID     ID1     NamesID1    ID2     NamesID2    ID3
1005    John     2005     Mary       3005     Rick       4005
3005    Rick     4005     Sam        5005     Harry      7105
4005    Sam      5005     Harry      7105     Deena 
5005    Harry    7105     Deena         
2005    Mary     3005     Rick       4005     Sam        5005
7105    Deena                                       

Thanks!

Upvotes: 0

Views: 557

Answers (1)

Chris Long
Chris Long

Reputation: 1319

Here is the format-based solution:

data table1;
  length id id1 id2 id3 id4 id5 8;
  infile datalines missover;
  input id id1 id2 id3 id4 id5;
  cards;
1005    2005    3005    4005    5005    7105
3005    4005    5005    7105        
4005    5005    7105            
5005    7105                
2005    3005    4005    5005    7105    
7105                    
;
run;

data table2;
  length id 8 names $ 10;
  input id names;
  cards;
1005    John
3005    Rick
4005    Sam
5005    Harry
2005    Mary
7105    Deena
;
run;

* Create a CNTLIN data set defining the required format;
data fmt_in;
  set table2;
  fmtname = 'names';
  start = id;
  label = names;
run;

* Run PROC FORMAT to generate the format from the CNTLIN data set;
proc format cntlin=fmt_in;
run;

* Apply the format to the input data set;
data out;
  set table1;
  namesID  = put(id, names.);
  namesID1 = put(id1, names.);
  namesID2 = put(id2, names.);
  namesID3 = put(id3, names.);
  namesID4 = put(id4, names.);
  namesID5 = put(id5, names.);
run;

This will be very efficient for large inputs because it doesn't require multiple sorts. In general, of course, your input data set table1 should be normalised to be tall and thin so that there is only one column holding IDs; that would have made the merge-based solution trivial, though probably still slower than using a format.

Upvotes: 1

Related Questions