Elijah
Elijah

Reputation: 306

Why many to many merge doesn't do cartesian product

data jul11.merge11;
input month sales ;
datalines ;
1 3123
1 1234
2 7482
2 8912
3 1284

;
run;


data jul11.merge22;
input month goal ;
datalines;
1 4444
1 5555
1 8989
2 9099
2 8888
3 8989
;
run;

data jul11.merge1;
merge jul11.merge11  jul11.merge22 ;
by month;
difference =goal - sales ;

run;

proc print data=jul11.merge1 noobs;
run;

output:

month sales goal difference

1   3123    4444    1321

1   1234    5555    4321

1   1234    8989    7755

2   7482    9099    1617

2   8912    8888    -24

3   1284    8989    7705

Why it didn't match all observation in table 1 with in table 2 for common months ?

pdv retains data of observation to seek if any more observation are left for that particular by group before it reinitialises it , in that case it should have done cartesian product .

Gives perfect cartesian product for one to many merging but not for many to many .

Upvotes: 0

Views: 2378

Answers (2)

Joe
Joe

Reputation: 63424

This is because of how SAS processes the data step. A merge is never a true cartesian product (ie, all records are searched and matched up against all other records, like a SQL comma join might ); what SAS does (in the case of two datasets) is it follows down one dataset (the one on the left) and advances to the next particular by-group value; then it looks over on the right dataset, and advances until it gets to that by group value. If there are other records in between, it processes those singly. If there are not, but there is a match, then it matches up those records.

Then it looks on the left to see if there are any more in that by group, and if so, advances to the next. It does the same on the right. If only one of these has a match then it will only bring in those values; hence if it has 1 element on the left and 5 on the right, it will do 1x5 or 5 rows. However, if there are 2 on the left and 3 on the right, it won't do 2x3=6; it does 1:1, 2:2, and 2:3, because it's advancing record pointers sequentially.

The following example is a good way to see how this works. If you really want to see it in action, throw in the data step debugger and play around with it interactively.

data test1;
input x row1;
datalines;
1 1
1 2
1 3
1 4
2 1
2 2
2 3
3 1
;;;;
run;

data test2;
input x row2;
datalines;
1 1
1 2
1 3
2 1
3 1
3 2
3 3
;;;;
run;


data test_merge;
merge test1 test2;
by x;
put x= row1= row2=;
run;

If you do want to do a cartesian join in SAS datastep, you have to do nested SET statements.

data want;
set test1;
do _n_ = 1 to nobs_2;
  set test2 point=_n_ nobs=nobs_2;
  output;
end;
run;

That's the true cartesian, you can then test for by group equality; but that's messy, really. You could also use a hash table lookup, which works better with BY groups. There are a few different options discussed here.

Upvotes: 3

Chris J
Chris J

Reputation: 7769

SAS doesn't handle many-to-many merges very well within the datastep. You need to use a PROC SQL if you want to do a many-to-many merge.

Upvotes: 0

Related Questions