shawnt00
shawnt00

Reputation: 17915

Does the order of expressions or predicates make any difference in the join condition for outer joins in standard SQL with ANSI joins?

Take a typical left outer join scenario. We all know that the order of the tables is quite significant, e.g., Q1 and Q2 are not equivalent:

SELECT A.x, B.y FROM A LEFT OUTER JOIN B ON A.id = B.id -- (Q1)
SELECT A.x, B.y FROM B LEFT OUTER JOIN A ON B.id = A.id -- (Q2)

When I think conceptually about multiple joins it usually seems natural to me to imagine picking up the new table as the object of interest and then describing how its rows are related to what's come before. Keeping the terms in parallel doesn't have any advantage to me and by my own habit I generally write the join condition this way:

SELECT A.x, B.y FROM A LEFT OUTER JOIN B ON B.id = A.id -- (Q3)

I had a conversation with a former coworker who misunderstood how the syntax works. To this coworker, Q3 was wrong and Q1 was right. And I do recognize that with the old-style outer join syntax this would matter and that's likely the source of this confusion. I've never heard or seen anyone else make this case using ANSI joins. Please answer this question and redeem my reputation, point out something I've overlooked, or offer deeper insight into the erroneous perspective.

Does the order of expressions or predicates make any difference in the join condition for outer joins in standard SQL with ANSI joins?

Upvotes: 3

Views: 494

Answers (4)

ErikE
ErikE

Reputation: 50241

I prefer your Q1, however it makes absolutely no difference in performance and has absolutely no effect on the query optimizer.

To me, putting the previous table first gives me the relevant information sooner in my scanning process. I can read join table B on A ... and already know which two tables are being joined together. When I read join table B on B.blasdjasdid = ... I've had to scan much farther and still don't know the most important information, which table is being joined to (which is a sort of namespace, a domain under which the column name will be understood). Additionally, if the columns are named the same in both tables (idiomatic in any database I design), I can avoid scanning to the end entirely, reading only join table B on A.SomethingId ... and already knowing that it's = B.SomethingId.

Going a little deeper, I would encourage you to ask about this occurrence on workplace.stackexchange.com because I suspect that the reasons your services were discontinued do not match what they told you; some inquiry into this might be productive. I'm not suggesting it was your fault, but that the reason given was likely a pretext.

Upvotes: 1

Michael Buen
Michael Buen

Reputation: 39393

I prefer Q3's JOIN's condition order too:

... ON B.id = A.id -- (Q3)

As it directly reflects that the B.id is the more varying one, you can think of A.id as the constant being tested against, e.g.

B.id = 1984

In the same vein that I don't want to see this in code...

1984 = B.id

...,like you I don't want to see this in query:

A.id = B.id

However, like most things in life, there are people who like little-endian, and there are those who like big-endian. Whatever mental model it may serve them on the preference they have chosen, they should be at least able to explain to you the rationale why they wanted A.id = B.id

I think, I have to switch my preference though, my(and your) preferred condition order doesn't work in some ORM, Linq in particular. I have yet to grasp why they impose that the condition should be in Q1's order:

from x in A
join y in B on x.id equals y.id

And reversing the condition (same as Q3, though in SQL query it is not an error) order results to syntax error, this won't be accepted by Linq:

from x in A
join y in B on y.id equals x.id

Now, I have to find the rationale why Microsoft Linq designers preferred the Q1 condition's order. And try to appreciate it if it makes sense, and just accept even it doesn't (yet) makes sense.


Regarding:

Does the order of expressions or predicates make any difference in the join condition for outer joins in standard SQL?

Results-wise, NO. Performance-wise, I have yet to see a query where the join's condition order makes the query faster. Even in forums I haven't seen anyone endorses reversing the condition to make the query faster.

If they can't explain to you the rationale or the mental model of their preferred condition's order serves, perhaps they are just doing Cargo Cult Programming or worse yet, Bikeshedding

Upvotes: 3

Guffa
Guffa

Reputation: 700352

No, it makes no difference.

Personally I prefer the style that you show in Q3, some of my workmates prefer the style in Q1. I don't know anyone who would ever consider either of them to be wrong.

The query optimiser turns the query inside out into something completely different, so the predicate doesn't even exist as a plain comparison any more when it's done with it. Usually it's a lookup in an index or a table, and as that can only be done in one direction, how the predicate was written makes no difference.

I checked (in SQL Server 2005) the execution plan of two queries with the predicate operands in different order, and as expected they are identical.

Upvotes: 5

Ned Batchelder
Ned Batchelder

Reputation: 375584

The order of the equality comparison makes no difference to the results of the join. But it might for inscrutable reasons affect the efficiency with which the result is computed. SQL optimizers are notorious for being affected by seemingly unimportant details like this.

Upvotes: 2

Related Questions