PotentialLime
PotentialLime

Reputation: 91

Understanding Order of ON Clause in Self-Joins (SQL)

I am trying to understand the SQL self-join - especially how the order of the ON clause matters in the query. This is probably a basic question but please bear with me as I'm a beginner in query language.

This is actually a LeetCode Question - #181 where I'm trying to get the employee whose salary is higher than their manager. You can check out the schema through the LeetCode link or the SQL Fiddle example I've provided below.

Question:

Basically I'm trying to understand the difference in output when I run the below two queries:

I changed the order of the ON clause From (ON e.ManagerId = m.Id) to (ON m.ManagerId = e.Id) and I'm getting the inverse answer from the desired output. I thought because it's a self-join, the order wouldn't matter since I'm extracting information from the identical table.

Please let me know what I'm missing and also point to any directions if possible! Thanks in advance!

1) Correct Query to get Desired Output

Select *
FROM Employee e
INNER JOIN Employee m
ON e.ManagerId = m.Id
WHERE e.Salary > m.Salary

SQL Fiddle Example

2) Incorrect Query

Select *
FROM Employee e
INNER JOIN Employee m
ON m.ManagerId = e.Id
WHERE e.Salary > m.Salary

SQL Fiddle Example

Upvotes: 2

Views: 1047

Answers (4)

JGFMK
JGFMK

Reputation: 8904

You might want to think of it as only the ids in the Manager_id columns are Managers.

So to get their names you could do:

select name from Employee where id in (select distinct ManagerId from Employee)

  • distinct is optional. I would tend to do it, if I was debegging the nested select, as it would make sense to only see same ManagerId in there once. Two+ Employees can potentially have the same Manager.

Try running the queries without the where clause, you will see the same results but column order has switched. This is because of the ON clause:

  • ON e.ManagerId = m.Id

    (Employee e to Manager m)

    Or longhand join Employee ManagerId, to Manager Id

    Joe as the Employee, with Sam as the Manager (ascending hierarchy as you read across columns)

  • ON m.ManagerId = e.Id

    (Manager m to Employee e)

    Or longhand join Manager ManagerId, to Employee Id

    Sam as the Manager, with Joe as the Employee (descending hierarchy as you read across columns)

Column order not withstanding, if you were to flip the WHERE clause from > to <= when you flipped the ON prefix order you would yield the same results.

Upvotes: 1

Jon Wilson
Jon Wilson

Reputation: 776

I think you are not realizing that the table alaises refer to the people. The m copy of the table is the manager, so the variable m.managerId would refer to the manager of the manager. Which is not what you want. So the correct link:

e.ManagerId = m.Id

is linking an employee row's manager to the manager row's ID.

Upvotes: 3

LordBaconPants
LordBaconPants

Reputation: 1414

The functionally the order doesn't matter (so, 'ON e.ManagerId = m.Id' is the same as 'ON m.Id = e.ManagerId').

What you are doing here is joining on different columns, which represent different things.

In the incorrect query, you are saying "the managers managerID is the same as the employees ID", which isn't true. Managers (as you've got it in your table) don't have managers themselves.

What you've essentially done is inverse the join. If you were to swap your sign around in you where statement, so WHERE e.Salary > m.Salary to WHERE e.Salary < m.Salary you'd get the same answer as your correct query

Upvotes: 3

Thorsten Kettner
Thorsten Kettner

Reputation: 94939

In both queries you are joining one employee with another. In the first query, however, you call the subordinate e and the manager m, while in the second you call the manager e and the subordinate m. Let's look at this more closely:

Select *
FROM Employee e
INNER JOIN Employee m
ON e.ManagerId = m.Id
WHERE e.Salary > m.Salary

You are combining an employee (that you call e for short) with their manager (an employee called m here, the ID of which is referenced as the manager ID in the employee record). Then you only keep employee / manager pairs where the employee's salary is greater than the manager's.

Select *
FROM Employee e
INNER JOIN Employee m
ON m.ManagerId = e.Id
WHERE e.Salary > m.Salary

You are combining an employee (that you call e for short) with their subordinate (an employee called m here, the manager ID of which is referencing the employee record). So, the employee that you call e is the other employee's manager. Then you only keep employee (manager) / subordinate pairs where the manager's salary is greater than the subordinate's.

Upvotes: 3

Related Questions