Reputation: 91
I am trying to understand the SQL self-join - especially how the order of the ON clause matters in the query. This is probably a basic question but please bear with me as I'm a beginner in query language.
This is actually a LeetCode Question - #181 where I'm trying to get the employee whose salary is higher than their manager. You can check out the schema through the LeetCode link or the SQL Fiddle example I've provided below.
Basically I'm trying to understand the difference in output when I run the below two queries:
I changed the order of the ON clause From (ON e.ManagerId = m.Id) to (ON m.ManagerId = e.Id) and I'm getting the inverse answer from the desired output. I thought because it's a self-join, the order wouldn't matter since I'm extracting information from the identical table.
Please let me know what I'm missing and also point to any directions if possible! Thanks in advance!
1) Correct Query to get Desired Output
Select *
FROM Employee e
INNER JOIN Employee m
ON e.ManagerId = m.Id
WHERE e.Salary > m.Salary
2) Incorrect Query
Select *
FROM Employee e
INNER JOIN Employee m
ON m.ManagerId = e.Id
WHERE e.Salary > m.Salary
Upvotes: 2
Views: 1047
Reputation: 8904
You might want to think of it as only the ids in the Manager_id columns are Managers.
So to get their names you could do:
select name from Employee where id in (select distinct ManagerId from Employee)
Try running the queries without the where clause, you will see the same results but column order has switched. This is because of the ON clause:
ON e.ManagerId = m.Id
(Employee e to Manager m)
Or longhand join Employee ManagerId, to Manager Id
Joe as the Employee, with Sam as the Manager (ascending hierarchy as you read across columns)
ON m.ManagerId = e.Id
(Manager m to Employee e)
Or longhand join Manager ManagerId, to Employee Id
Sam as the Manager, with Joe as the Employee (descending hierarchy as you read across columns)
Column order not withstanding, if you were to flip the WHERE
clause from >
to <=
when you flipped the ON
prefix order you would yield the same results.
Upvotes: 1
Reputation: 776
I think you are not realizing that the table alaises refer to the people. The m copy of the table is the manager, so the variable m.managerId would refer to the manager of the manager. Which is not what you want. So the correct link:
e.ManagerId = m.Id
is linking an employee row's manager to the manager row's ID.
Upvotes: 3
Reputation: 1414
The functionally the order doesn't matter (so, 'ON e.ManagerId = m.Id' is the same as 'ON m.Id = e.ManagerId').
What you are doing here is joining on different columns, which represent different things.
In the incorrect query, you are saying "the managers managerID is the same as the employees ID", which isn't true. Managers (as you've got it in your table) don't have managers themselves.
What you've essentially done is inverse the join. If you were to swap your sign around in you where
statement, so WHERE e.Salary > m.Salary
to WHERE e.Salary < m.Salary
you'd get the same answer as your correct query
Upvotes: 3
Reputation: 94939
In both queries you are joining one employee with another. In the first query, however, you call the subordinate e and the manager m, while in the second you call the manager e and the subordinate m. Let's look at this more closely:
Select *
FROM Employee e
INNER JOIN Employee m
ON e.ManagerId = m.Id
WHERE e.Salary > m.Salary
You are combining an employee (that you call e for short) with their manager (an employee called m here, the ID of which is referenced as the manager ID in the employee record). Then you only keep employee / manager pairs where the employee's salary is greater than the manager's.
Select *
FROM Employee e
INNER JOIN Employee m
ON m.ManagerId = e.Id
WHERE e.Salary > m.Salary
You are combining an employee (that you call e for short) with their subordinate (an employee called m here, the manager ID of which is referencing the employee record). So, the employee that you call e is the other employee's manager. Then you only keep employee (manager) / subordinate pairs where the manager's salary is greater than the subordinate's.
Upvotes: 3