Jie Hu
Jie Hu

Reputation: 539

SQL performance of self JOINs

I'm learning self-joins and I notice there are two kinds of statement:

-- use from ... where .... like cartesian
select e1.name as employee, e2.name as manager
from emp e1, emp e2
where e1.emp_id = e2.manager_id

-- use join ... on ...
select e1.name as employee, e2.name as manager
from emp e1 join emp e2
on e1.emp_id = e2.manager_id

I notice the plan time and execution time are almost the same, does it mean both methods are similar and I can choose whatever? Is there any trade-off to select one instead of another?

Thanks

Upvotes: 2

Views: 1691

Answers (2)

itsLex
itsLex

Reputation: 786

Yes, you can choose whatever. There is nothing wrong with either way. There is no rule that you should use JOIN and there is no police enforcing it. However some people think they must make those rules AND be the police.

That said, there are differences. Putting the join condition in the where-clause is somewhat old fashioned. Using JOIN and ON may be a bit more structured, although the other way is not blindingly confounding. When you are using outer joins, the syntax is clearer and more the same for different databases.

The most important difference is: when using ON you can add restrictions to the ON clause which are performed on the table before the join is performed, which can significantly improve performance. For example:

SELECT e1.name as manager, e2.name as employee
FROM emp e1 
JOIN emp e2  -- INNER is implicit
ON e1.emp_id = e2.manager_id  
AND e2.department = 'Sales'

Upvotes: 2

Barranka
Barranka

Reputation: 21047

You should always use join when defining relations in queries: That way you'll keep relations and conditions clearly and orderly separated.

"Keep the order, and order will keep you"

JOIN is meant to be used when defining the domain of the data set you're working for, and WHERE is meant to be used when getting a subset of the domain you've already defined. A quick check list I make when writing queries regarding the order of the execution of a SQL SELECT statement is:

  1. FROM
    • INNER JOIN
  2. WHERE
  3. GROUP BY (and aggregations)
  4. HAVING
  5. ORDER BY and SELECT

(Yes, I know it's simplistic, but it helps me). This way I keep things in order: First, I have to define the source of my data set, and that includes any relations I may need. After that, I can filter my data set to suit my needs, and then I can transform it. Only then I can see my data.

Consider the following example (using the where approach):

select e1.name as employee, e2.name as manager
from emp e1, emp e2
where e1.emp_id = e2.manager_id
  and e1.departament = 'Sales' -- When you add conditions to the query things can be 
                               -- confusing: Where the 'relations' end and the 'conditions'
                               -- begin?

Now, consider the same example, using join:

select e1.name as employee, e2.name as manager
from emp e1 inner join emp e2 on e1.emp_id = e2.manager_id -- Relations are clearly 
where e1.departament = 'Sales'                             -- separated, and you can focus
                                                           -- on filter conditions in the
                                                           -- WHERE clause

Another example: What if you need to use more than one table? If you use implicit relations (i.e. defining relations in the WHERE clause) you'll find yourself confused very soon. And what if you need to define OUTER JOINs?

Bottom line: Always use JOIN: It will help you

Upvotes: 6

Related Questions