Reputation: 337
I would like to know how to merge two datasets in SAS using a variable's value in the first dataset to select and test a variable in the second dataset.
As an example consider two datasets. The first dataset contains four baby names and the days they were born. The second data set contains three doctors and an array of indicator variables noting if each doctor worked on a particular day. For example Dr. Smith worked on days 2 and 3 only. I would like to create a dataset that lists all the possible baby-doctor combinations where the doctor was working on the day the baby was born.
data babies;
input baby_name $ birth_day;
datalines;
Jake 1
Sonny 4
North 5
Apple 6
;
run;
data doctors;
input DrLastname $ day1 day2 day3 day4 day5 day6;
datalines;
Jones 1 0 0 1 1 1
Smith 0 1 1 0 0 0
Lewis 1 1 1 0 0 0
;
run;
The solution seems like it should be something like this
proc sql;
create table merged as
select babies.*, doctors.*
from babies, doctors
where doctors.day(babies.birth_day) = 1; *<--- incorrect;
quit;
The output should be:
baby_name birth_day DrLastName
Jake 1 Jones
Jake 1 Lewis
Sonny 4 Jones
North 5 Jones
Apple 6 Jones
I have run into this problem a few times and would love to know if this is kind of merge is possible in SAS. Thanks for any help you can provide.
Upvotes: 0
Views: 123
Reputation: 63424
While I probably would also transpose the dataset, it is possible to do so without transposing.
data babies_doctors;
set babies;
do _i = 1 to nobs_doctors;
set doctors point=_i nobs=nobs_doctors;
array days day1-day6;
if days[birth_Day] then output;
end;
run;
This will not be fast, as it checks all rows in the dataset, but it's possible.
Fastest is probably to load it into a vertical hash table (which you could do easily) or a temporary array.
data babies_doctors_array;
array drnames[32767] $80 _temporary_;
array drdays[32767,6] _temporary_;
if _n_=1 then do;
do _i = 1 to nobs_doctors;
set doctors point=_i nobs=nobs_doctors;
array days day1-day6;
drnames[_i]=DrLastname;
do _j = 1 to dim(days);
drdays[_i,_j]=days[_j];
end;
end;
end;
set babies;
do _k = 1 to nobs_doctors;
if drdays[_k,birth_day]=1 then do;
baby_drlastname = drnames[_k];
output;
end;
end;
run;
Upvotes: 3
Reputation: 1985
I might shift the second dataset and then merge on day.
Something like (in untested pseudo code):
data new_1-new_6;
set doctor;
array day_1-day_6 day_{6}
for i in 1 to 6:
if day_{i} = 1 then do;
day = i;
output new_{i};
end;
end;
run;
data stacked;
set day_1-day_6;
run;
Then simply merge based on the field day.
Upvotes: 1