Reputation: 51
I have two SAS datasets of following type
Dataset1, D1 is as follows
ID Date Amount
x1 10/12/2015 100
x2 200
x2 150
x3 10/10/2014 90
x4 60
Dataset, D2 is of the form
ID Date
x2 10/12/2016
x4 1/1/2016
Dataset D1 can have duplicate values of ID. Dataset D2 has only unique values of ID. Further, D2 consists of only IDs in Dataset D1 which has missing values of variable date (x2 and x4 have date missing in D1). I want to merge D1 with D2 such that the output is as follows
ID Date Amount
x1 10/12/2015 100
x2 10/12/2016 200
x2 10/12/2016 150
x3 10/10/2014 90
x4 1/1/2016 60
Is this doable without using proc sql in SAS. Can I use merge?
I tried using the following but to no use (and it should not work either because D1 has duplicate IDs)
data x;
merge D1 (in=in1) D2(in=in2);
by ID;
if in1;
run;
Upvotes: 0
Views: 239
Reputation: 6378
DATA step merge of two datasets works fine when only one of the datasets has a unique ID. The problem with your merge is that the DATE variables from each dataset will collide:
231 options msglevel=i;
232 data x;
233 merge D1 D2;
234 by ID;
235 put (ID Date Amount)(=);
236 run;
INFO: The variable Date on data set WORK.D1 will be overwritten by data set WORK.D2.
ID=x1 Date=10/12/2015 Amount=100
ID=x2 Date=10/12/2016 Amount=200
ID=x2 Date=. Amount=150
ID=x3 Date=10/10/2014 Amount=90
ID=x4 Date=01/01/2016 Amount=60
The MSGLEVEL=i option generates the INFO: line in the log which alerts you to the collision. In this case you almost get the results you want, despite the collision. The problem is the third record, where DATE is missing. This is a side-effect of having a collision in a one-to-many merge.
I would suggest you avoid the collision by renaming the DATE variable in each dataset. You can then compute a new DATE variable by using the COALESCE() function, which returns the first value that is not missing:
237 data want;
238 merge d1 (keep=ID Date Amount rename=(Date=Date1))
239 d2 (keep=ID Date rename=(Date=Date2))
240 ;
241 by ID;
242 Date=coalesce(Date1,Date2);
243 put (ID Date1 Date2 Date Amount)(=);
244 format Date mmddyy10.;
245 run;
ID=x1 Date1=10/12/2015 Date2=. Date=10/12/2015 Amount=100
ID=x2 Date1=. Date2=10/12/2016 Date=10/12/2016 Amount=200
ID=x2 Date1=. Date2=10/12/2016 Date=10/12/2016 Amount=150
ID=x3 Date1=10/10/2014 Date2=. Date=10/10/2014 Amount=90
ID=x4 Date1=. Date2=01/01/2016 Date=01/01/2016 Amount=60
Upvotes: 1
Reputation: 31
Is this doable without using proc sql in SAS. Can I use merge?
Yes, any SQL step can be done in a Data step, but may take up more or less code space.
Here is a potential solution:
data DateN DateY;
set D1;
if date=. then output step1;
else output step2;
run;
data merge;
DateN(keep=ID Amount) D2;
by id;
run;
data x;
set merge DateY;
run;
proc sort data=x;
by ID;
run;
This assumes that the missing values from D1 have unique ID.
Upvotes: 1