Reputation: 139
I have a source table with ~40,000,000 unique/distinct entries and a new table with ~80,000,000 unique/distinct entries, both one columns each.
I need to create a flag column in the new table, which displays 1 if the corresponding observation's unique entry is also present(has matches) in the source table, and 0 if it has no matches.
How do I go about it most efficiently?
Is using proc sql inevitable, or can it be solved using SAS alone?
Upvotes: 1
Views: 1888
Reputation: 79
Here's my solution for your reference:
/*step1: create a new dataset containing unique values of your new table*/
data new1;
set you_new_table(keep=var1 var2 var3...varx);
proc sort nodup;
by var1 var2 var3...varx;
run;
/*step2: create a new dataset containing unique values of your old table*/
data old1;
set you_old_table(keep=var1 var2 var3...varx);
proc sort nodup;
by var1 var2 var3...varx;
run;
/*step3: merge the two table that just created (inner join)*/
data compare1;
merge old1(in=a) new1(in=b);
by var1 var2 var3...varx;
if a and b;
proc sort;
by var1 var2 var3...varx;
run;
/*step4: merge the inner join dataset with the new source table and generate the flag*/
data new2;
set you_new_table;
proc sort;
by var1 var2 var3...varx;
run;
data new3;
/*merge new2(in=a) new1(in=b); sry, a mistake, shame.*/
merge new2(in=a) compare1(in=b);
by var1 var2 var3...varx;
if a;
if b then flag=1;
else flag=0;
run;
Upvotes: 2
Reputation: 51566
To add a new variable to a dataset you need to make a new dataset. Assuming the data is already sorted on the key variable just do a MERGE. You can use the IN= dataset option to check which datasets contribute observations to the merge.
data want ;
merge new(in=innew) old(in=inold);
by key_var;
if innew;
new_var = inold;
run;
Upvotes: 3