Bostonian
Bostonian

Reputation: 687

The execution plan for mixed LEFT JOIN and INNER JOIN in SQL query

I am testing a in-house relational DB and I cannot figure out the why the optimizer could figure out this query plan.

SELECT * from A LEFT JOIN B on A.x = B.x INNER JOIN C on B.y = C.y

The execution plan reported by this database could be written like below psuedo-code:

For each tupleB in B
   For each tupleC in C
     INNER JOIN tupleB and tupleC
     For each tupleA in A
         INNER JOIN tupleA

The result for this plan is correct.

If all the JOINs are INNER joins, this plan looks reasonable to me since INNER JOIN is both commutative and associative.

However, when the LEFT JOIN and INNER JOIN are mixed, how does the optimizer could figure out INNER_JOIN(INNER_JOIN(B,C),A) share the same result with INNER_JOIN(LEFT_JOIN(A,B),C)?

Is there a theory to prove this or this just happens case by case?

Upvotes: 0

Views: 819

Answers (1)

Radim Bača
Radim Bača

Reputation: 10701

The equivalence of plans INNER_JOIN(LEFT_JOIN(A,B),C) and INNER_JOIN(INNER_JOIN(B,C),A) has two steps:

  1. INNER_JOIN(LEFT_JOIN(A,B),C) is equivalent to INNER_JOIN(INNER_JOIN(A,B),C)
  2. INNER_JOIN(INNER_JOIN(A,B),C) is equivalent to INNER_JOIN(INNER_JOIN(B,C),A)

The first equivalence is the more difficult to see. Once you perform LEFT_JOIN(A,B) you may have rows from A not having a counterpart in B. These rows are the only rows that are not in INNER_JOIN(A,B). These rows will have NULL values in B attributes. Subsequently, you perform an inner join with C using B.y and those extra rows have to disappear in the final result since the B.y is NULL and the join condition B.y = C.y is always evaluated to uknown. Therefore, in the final result you have only those rows which are product of INNER_JOIN(A,B) even if you process the LEFT_JOIN(A,B).

The second equivalence is commes from the join associativity

Upvotes: 1

Related Questions