Reputation: 8345
If you have multiple datasets (hundreds) with the same variable names and would like to merge them by a key, is there a simple way to control which value of a variable to take for the variables that are not the key? One way to do this would be a rename on the merge statement then write another step to use those renamed variable to calculate the most frequent value with an array...but I'm really wondering if there's an built in way of handling this. For example:
data ds1;
infile datalines dsd delimiter=' ';
input var1 $ var2;
datalines;
a 1
b 2
;
run;
data ds2;
infile datalines dsd delimiter=' ';
input var1 $ var2;
datalines;
a
b 2
;
run;
data ds3;
infile datalines dsd delimiter=' ';
input var1 $ var2;
datalines;
a 1
b
;
run;
data ds123;
merge ds1 ds2 ds3;
by var1;
run;
This code will 'pick' the 'furthest right' var2 i.e. the dataset ds123:
a 1
b
But I may want it to be:
a 1
b 2
as this would match the most frequent values.
Upvotes: 0
Views: 61
Reputation: 12465
Use an SQL join and the coalesce function. Specify the preference order in the coalesce and the first non-missing in that order will be used.
proc sql noprint;
create table ds123 as
select a.var1,
coalesce(a.var2,b.var2,c.var2) as var2
from ds1 as a,
ds2 as b,
ds3 as c
where a.var1 = b.var1
and b.var1 = c.var1;
quit;
Upvotes: 1