Reputation: 13
I am quite new to using SAS and would like to ask for your help on merging two datasets that do not contain one key identifier but require a combination of three conditions to be merged successfully.
I have two datasets:
StartDate
, EndDate
, EmployeeNumber
and CustomerNumber
. EmployeeNumber
and CustomerNumber
The goal is to keep the first dataset and add information from the second dataset in case it is available - based on a match of date, employee number and customer number.
It is important that the date variable in the activity dataset is just on day while in the visits dataset it is a range between start and end date. So, the data sets look like this: Base datasets
Logically, it should be a merge based on three conditions:
So, the ideal new dataset would look like this:
Intended result of merge
If there is no entry in the activity dataset, the new data set should simply contain a blank for the respective variables. For all matches, it should add the respective variable.
When trying (rather simple) merge statements in SAS, I usually get an error message stating that "BY variables are not properly sorted on data set". Furthermore, I am not sure how to implement the conditions (in particular the match of date to a range) properly.
Can this sort of merge be done in SAS or SQL?
Any help is greatly appreciated!!!
Upvotes: 0
Views: 1731
Reputation: 324
I tend to use proc sql for this kind of stuff, it's just a relatively simple join in which SQL excels. Try something like this:
proc sql;
create table merged_data as
select a.*, b.other_variable, b.other_variable2
from dataset1 as a
left join dataset2 as b
on a.EmployeeNumber = b.EmployeeNumber
and a.CustomerNumber = b.CustomerNumber
and b.Date between a.StartDate and a.EndDate
;
quit;
Upvotes: 0