antonmeissner
antonmeissner

Reputation: 3

Left join table on multiple tables in SAS

I've got multiple master tables in the same format with the same variables. I now want to left join another variable but I can't combine the master tables due to limited storage on my computer. Is there a way that I can left join a variable onto multiple master tables within one PROC SQL? Maybe with the help of a macro?

The LEFT JOIN code looks like this for one join but I'm looking for an alternative than to copy and paste this 5 times:

PROC SQL;
CREATE TABLE New AS
    SELECT a.*, b.Value
        FROM Old a LEFT JOIN Additional b
            ON a.ID = b.ID;
QUIT;

Upvotes: 0

Views: 3508

Answers (2)

DomPazz
DomPazz

Reputation: 12465

You can do this in a single pass through the data in a Data Step using a hash table to lookup values.

data new1 new2 new3;
set old1(in=a) old2(in=b) old3(in=c);
format value best.;
if _n_=1 then do;
    %create_hash(lk,id,value,"Additional");
end;
value = .;
rc = lk.find();

drop rc;

if a then
    output new1;
else if b then 
    output new2;
else if c then
    output new3;
run;

%create_hash() macro available here.

You could, alternatively, use Joe's format with the same Data Step syntax.

Upvotes: 1

Joe
Joe

Reputation: 63424

You can't do it in one create table statement, as it only creates one table at a time. But you can do a few things, depending on what your actual limiting factor is (you mention a few).

If you simply want to avoid writing the same code five times, but otherwise don't care how it executes, then just write the code in a macro, as you reference.

%macro update_table(old=, new=);
PROC SQL;
CREATE TABLE &new. AS
    SELECT a.*, b.Value
        FROM &old. a LEFT JOIN Additional b
            ON a.ID = b.ID;
QUIT;
%mend update_table;
%update_table(old=old1, new=new1)
%update_table(old=old2, new=new2)
%update_table(old=old3, new=new3)

Of course, if the names of the five tables are in a pattern, you can perhaps automate this further based on that pattern, but you don't give sufficient information to figure that out.

If you on the other hand need to do this more efficiently in terms of processing than running the SQL query five times, it can be done a number of ways, depending on the specifics of your additional table and your specific limitations. It looks to me that you have a good use case for a format lookup here, for example; see for example Jenine Eason's paper, Proc Format, a Speedy Alternative to Sort/Merge. If you're just merging on the ID, this is very easy.

data for_format;
  set additional;
  start = ID;
  label = value;
  fmtname='AdditionalF'; *or '$AdditionalF' if ID is character-valued;
  output;
  if _n_=1 then do;  *creating an "other" option so it returns missing if not found;
    hlo='o';
    label = ' ';
    output;
  end;
run;

And then you just have five data steps with a PUT statement adding the value, or even you could simply format the ID variable with that format and it would have that value whenever you did most PROCs (if this is something like a classifier that you don't truly need "in" the data).

Upvotes: 1

Related Questions