ZDR
ZDR

Reputation: 47

one-to-many merge in SAS (with mismatching records)

I'm trying to do a merge in SAS which seems like it should be easy, but thus far haven't found anything that points to how to do it. I have two datasets - let's call them Monthly1 and Quarterly2. Monthly1 is the main one, and I want to merge Quarterly2 onto it. The problem is, the latter has a different periodicity, as the name implies. Here is what each looks like:

MONTHLY1:

id,month,year,qname,data1,data2,data3,data4

1111,01,2018,First Quarter 2018,27,33,55
1111,02,2018,First Quarter 2018,28,34,54
1111,03,2018,First Quarter 2018,28,37,51
1111,04,2018,Second Quarter 2018,28,30,59
1112,01,2018,First Quarter 2018,1,7,12
1112,02,2018,First Quarter 2018,2,7,10
1112,03,2018,First Quarter 2018,5,6,7
1112,04,2018,Second Quarter 2018,4,1,8

QUARTERLY2:

id,qname,data4,data5

1111,First Quarter 2018,53,7
1111,Second Quarter 2018,58,9
1112,First Quarter 2018,7,7
1112,Second Quarter 2018,7,9

Merging on id and qname. So for each month in MONTHLY1, data 4 is overwritten using the value in QUARTERLY2, and data 5 is added as a new variable. The final dataset should look like (I've added a * where data from QUARTERLY2 is either added or overwritten):

MERGED3:

id,month,year,qname,data1,data2,data3,data4,data5

1111,01,2018,First Quarter 2018,27,33,53*,7*
1111,02,2018,First Quarter 2018,28,34,53*,7*
1111,03,2018,First Quarter 2018,28,37,53*,7*
1111,04,2018,Second Quarter 2018,28,30,58*,9*
1112,01,2018,First Quarter 2018,1,7,7*,7*
1112,02,2018,First Quarter 2018,2,7,7*,7*
1112,03,2018,First Quarter 2018,5,6,7*,7*
1112,04,2018,Second Quarter 2018,4,1,7*,9*

Thanks!

Upvotes: 1

Views: 1901

Answers (2)

Parfait
Parfait

Reputation: 107767

Because SAS will overwrite with same named columns in a merge but only for first matches, consider renaming during the merge. Then, keep the columns you need. Also below demonstrates a left join merge (keeps all rows in Montlhy1 regardless if it matches or not with Quarterly1):

proc sort data=Monthly1; by id qname; run;
proc sort data=Quarterly2; by id qname; run;

data merged3;
    merge Monthly1(in=x RENAME=(data4=data4_x)) Quarterly2(in=y);
    by id qname;
    if x;
    keep id month year qname data1-data5; 
run;

Alternatively, consider a left join merge with proc sql and explicitly keep the columns you intend designated by table aliases. Again, left join is used:

proc sql;
    create table merged4 as
    select m.id, m.month, m.year, m.qname, m.data1, m.data2, m.data3, q.data4, q.data5
    from Monthly1 m
    left join Quarterly2 q
      on m.id = q.id AND m.qname = q.qname;
quit;

Upvotes: 0

Richard
Richard

Reputation: 27546

Usage Note 48705: A one-to-many merge with common variables that are not the BY variables will have values from the many data set after the first observation, states:

In a one-to-many merge with common variables that are not the BY variables, customers sometimes expect that the value for the common variable from the "one" data set will be retained throughout the BY group if the "one" variable is listed second in the MERGE statement. It is correct that in a one-to-one merge, and for the first matching observation in a one-to-many merge, the value of a common variable in the latter data set(s) overwrites the value from the previous data set. However, on subsequent iterations of the MERGE statement for the same BY group, the "one" data set is not read again. Therefore, the resulting output contains the value of the variable from the "many" data set.

You can change the behavior by renaming a non-by variable so it is not in common -- the new non-common variable value will be maintain across the subsequent many by matches and can be applied.

For example (your data)

Monthly (the many)

data have_monthly;
infile cards dlm=',';
length id month year 8 qname $30; input
id month year qname data1 data2 data3; data4=_n_*1000; datalines;
1111,01,2018,First Quarter 2018,27,33,55
1111,02,2018,First Quarter 2018,28,34,54
1111,03,2018,First Quarter 2018,28,37,51
1111,04,2018,Second Quarter 2018,28,30,59
1112,01,2018,First Quarter 2018,1,7,12
1112,02,2018,First Quarter 2018,2,7,10
1112,03,2018,First Quarter 2018,5,6,7
1112,04,2018,Second Quarter 2018,4,1,8
run;

Quarterly (the one)

data have_quarterly;
infile cards dlm=',';
length id 8 qname $30;
input id qname data4 data5; datalines;
1111,First Quarter 2018,53,7
1111,Second Quarter 2018,58,9
1112,First Quarter 2018,7,7
1112,Second Quarter 2018,7,9
run;

Default merge

data try (label="The quarter data4 values are not propagated");
  merge
    have_monthly
    have_quarterly
  ;
  by
    id
    qname
  ;
run;

Same merge with a rename to ensure non-commonality in non-by variable data4

data want(label="The quarter data4 values are propagated");
  merge
    have_monthly
    have_quarterly (rename=data4=_data4)
  ;
  by
    id
    qname
  ;
  data4 = _data4;
run;

Upvotes: 2

Related Questions