Raja
Raja

Reputation: 6824

How to remove duplicates in a complicated JOIN

I've an EMPLOYEE table which has built-in hierarchy structure as well (using manager column)

EMPLOYEE table

I've another REGION table which represents manager-region relation

REGION table

I'm trying to create a SQL which will show which employees fall under which region by following up the hierarchy chain.

Constraints / Rules:

This is the naive SQL I came up with (but results have duplicates - 3rd Rule failed)

select distinct e.name, r.region
from employee e 
left outer join employee mgr1 on mgr1.id = e.manager 
left outer join employee mgr2 on mgr2.id = mgr1.manager 
left outer join employee mgr3 on mgr3.id = mgr2.manager 
left outer join employee mgr4 on mgr4.id = mgr3.manager 
left outer join REGION r on 
      (  r.id = mgr1.id 
      or r.id = mgr2.id 
      or r.id = mgr3.id 
      or r.id = mgr4.id  ) 

where e.IS_MANAGER = 'N'; //only interested in users for now; assume a flag

And this is the ResultSet:

RESULTS

How do I conditionally stop the left outer join if I already found a region?

Upvotes: 1

Views: 95

Answers (3)

Clockwork-Muse
Clockwork-Muse

Reputation: 13056

DB2 (almost all versions) supports recursive CTEs, which are made to handle this kind of hierarchical data (some versions also support Oracles CONNECT BY, but I don't have any familiarity with that). Using it may make the connections easier to reason about:

WITH Employee_Region AS (SELECT name, manager, CAST(null AS VARCHAR(2)) AS region
                         FROM Employee
                         WHERE is_manager = 'N'
                         UNION ALL
                         SELECT ER.name, Manager.manager, Region.regionName
                         FROM Employee_Region ER
                         JOIN Employee Manager
                           ON Manager.id = ER.manager
                         LEFT JOIN Region 
                                ON Region.id = Manager.regionId
                         WHERE ER.region IS NULL)
SELECT name, region
FROM Employee_Region
WHERE region IS NOT NULL

SQL Fiddle Example
(Fiddle base taken from @PhilWalton - thanks! PostgreSQL requires the RECURSIVE keyword, but DB2 doesn't)

The query does start from the bottom (assuming you have a flag), but it's possible to reverse it and start from the top-level managers.
For the recursing section (everything in the CTE after UNION ALL):

  • we self-JOIN to the Employee table to get the next manager up
  • because we don't know if the current manager has a region, we LEFT JOIN to that table.
  • if we didn't find a region, try again, using the next manager up.
  • we exclude all rows who, in the previous iteration, received a region (will grab the 'lowest' region). Otherwise, termination stops at the top-level manager.

Finally, in the main query, we exclude those rows where an employee has no region. Mostly this will remove the intermediate rows generated wile searching until a manager with a region is found, although if some tree has no region (somehow) it would exclude them.
Using a WHERE is almost certainly cheaper than the hashing function required with DISTINCT, although I'm not sure what effect the recursive portion will have (if most immediate managers have a region, or within one hop, it's likely to perform better than doing four joins)

Upvotes: 0

Phil Walton
Phil Walton

Reputation: 963

I had to modify your script a bit, but this works:

select distinct e.Name,
  CASE
    WHEN r1.RegionName IS NOT NULL THEN r1.RegionName
    WHEN r2.RegionName IS NOT NULL THEN r2.RegionName
    WHEN r3.RegionName IS NOT NULL THEN r3.RegionName
    WHEN r4.RegionName IS NOT NULL THEN r4.RegionName
    ELSE 'NA'
  END AS 'RegionName'
from employee e 
left outer join employee mgr1 on mgr1.id = e.Manager 
left outer join employee mgr2 on mgr2.id = mgr1.Manager 
left outer join employee mgr3 on mgr3.id = mgr2.Manager 
left outer join employee mgr4 on mgr4.id = mgr3.Manager 
left outer join Region r1 on r1.id = mgr1.RegionID 
left outer join Region r2 on r2.id = mgr2.RegionID 
left outer join Region r3 on r3.id = mgr3.RegionID 
left outer join Region r4 on r4.id = mgr4.RegionID 

where e.IS_MANAGER = 'N';

Here's the SQL Fiddle: http://sqlfiddle.com/#!9/93b45/5

Upvotes: 1

MauriDev
MauriDev

Reputation: 445

Try this:

select distinct e.name, COALESCE(r1.region, r2.region, r3.region, r4.region, 'No Region') region
from employee e left outer join 
                region r1 on e.manager = r1.id
left outer join employee mgr1 on mgr1.id = e.manager left outer join
                region r2 on mgr1.manager = r2.id
left outer join employee mgr2 on mgr2.id = mgr1.manager left outer join
                region r3 on mgr2.manager = r3.id
left outer join employee mgr3 on mgr3.id = mgr2.manager left outer join
                region r4 on mgr3.manager = r4.id
where e.IS_MANAGER = 'N'; //only interested in users for now; assume a flag

I'm not sure the COALESCE function is supported on all mysql versions but you could find an equivalent (it returns the first non null argument).

Upvotes: 1

Related Questions