kalyan4uonly
kalyan4uonly

Reputation: 335

How to tune the multiple self join query in SQL Server

I have table like as below

enter image description here

I need to fetch summary based on issue id and parent id, I need to fetch actual parent summary as parent desc and preceding summary as succeed desc. There will be 5 level hierarchy (max).

So I self-joined the table for 5 times, the below code is the example code. But the query is taking almost 250 seconds to execute. How to optimize the code in SQL Server?

Expected output

enter image description here

WITH  cte_a AS
(
    SELECT
        'AON property' AS summary, 1001 AS issue_id, 2001 AS parent_id
    UNION ALL
    SELECT 'AON property L1', 2001, 3001
    UNION ALL
    SELECT 'AON Property L2', 3001, 4001
    UNION ALL
    SELECT 'AON Property L3', 4001, NULL
    UNION ALL
    SELECT 'LONG CHAIN CLUBS', 1002, 2222
    UNION ALL
    SELECT 'LONG CHAIN L1', 2222, 3003
    UNION ALL
    SELECT 'LONG CHAIN L2', 3003, NULL
)
SELECT
    a.*,
    CASE 
        WHEN f.summary IS NOT NULL THEN e.summary
        WHEN e.summary IS NOT NULL THEN d.summary
        WHEN d.summary IS NOT NULL THEN c.summary
        WHEN c.summary IS NOT NULL THEN b.summary
        WHEN b.summary IS NOT NULL THEN then a.summary 
    END AS succeed_desc,
    COALESCE (f.summary, e.summary, d.summary, c.summary, b.summary, a.summary) AS parent_desc
FROM
    cte_a a 
LEFT JOIN 
    cte_a b ON a.parent_id = b.issue_id --Level1
LEFT JOIN 
    cte_a c ON b.parent_id = c.issue_id --Level2
LEFT JOIN 
    cte_a d ON c.parent_id = d.issue_id --Level3
LEFT JOIN 
    cte_a e ON d.parent_id = e.issue_id --Level4
LEFT JOIN 
    cte_a f ON e.parent_id = f.issue_id --Level5

Upvotes: 0

Views: 149

Answers (1)

Atmo
Atmo

Reputation: 3975

I agree with Martin Smith saying in the comments that you need to define an INDEX.
With that said, you also need to look into recursive CTEs. summary, issue_id, parent_id all come from cte_a; parent_desc is going to be the result of the recursive CTE and succeed_desc is the result of a simple join.

I will work my way through in 3 steps.

Step 1: Building up the recursive CTE

A recursive CTE is in the form:

WITH Recursive_CTE AS (
    SELECT ...
    FROM SomeTable
    UNION ALL
    SELECT ...
    FROM Recursive_CTE
    JOIN SomeTable ON ...
)

In your case, you want to browse into records from a parent_id to the next issue_id. Importantly, you want to remember the original issue_id and parent_id to build the final result AND the joined parent_id to keep your recursion going.
For clarity's sake, I will add a column so you can keep track of the recursion (I will drop it later).

WITH cte_a(summary, issue_id, parent_id) AS
(
   ...
), cte_b(summary, issue_id, parent_id, ancestor_id, loop_number) AS (
    SELECT summary, issue_id, parent_id, parent_id, 0 from cte_a
    UNION ALL
    SELECT cte_b.summary, cte_b.issue_id, cte_b.parent_id, cte_a.parent_id, 1+loop_number
    FROM cte_a
    JOIN cte_b ON cte_a.issue_id = cte_b.ancestor_id
)
SELECT *
FROM cte_b

Note that parent_id appears twice in the initialisation of the recursive CTE. You should not be surprised by that: 1 is to keep the original value, the other is for the loop. The difference shows after the UNION ALL (cte_b.parent_id vs cte_a.parent_id).

Step 2: Preparing the output for the final query

We now need to tweak the above query in order to:

  • Add the parent_desc column (that will be in the recursive CTE)
  • Allow the additiona of the succeed_desc column (that will also be in the CTE)
  • Keep only the records we want to keep in the final result (that will be outside the CTE)

For that, we add 2 columns in the CTE and the records we want to keep are the ones where the recursion stopped (ancestor_id IS NULL):

WITH cte_a(summary, issue_id, parent_id) AS
(
    ...
), cte_b(summary, issue_id, parent_id, penultimate_ancestor_id, ancestor_id, ancestor_summary) AS (
    SELECT summary, issue_id, parent_id, issue_id, parent_id, summary from cte_a
    UNION ALL
    SELECT cte_b.summary, cte_b.issue_id, cte_b.parent_id,
           cte_a.issue_id, cte_a.parent_id, cte_a.summary
    FROM cte_a
    JOIN cte_b ON cte_a.issue_id = cte_b.ancestor_id
)
SELECT *
FROM cte_b
WHERE ancestor_id IS NULL

Step 3: Finalizing the query

In the previous step, we added the penultimate_ancestor_id column to keep track of the last issue_id encountered before reaching NULL. succeed_desc can easily be obtained using it.

WITH cte_a(summary, issue_id, parent_id) AS
(
    ...
), cte_b(summary, issue_id, parent_id, penultimate_ancestor_id, ancestor_id, ancestor_summary) AS (
    SELECT summary, issue_id, parent_id, issue_id, parent_id, summary from cte_a
    UNION ALL
    SELECT cte_b.summary, cte_b.issue_id, cte_b.parent_id,
           cte_a.issue_id, cte_a.parent_id, cte_a.summary
    FROM cte_a
    JOIN cte_b ON cte_a.issue_id = cte_b.ancestor_id
)
SELECT subquery.summary,
       subquery.issue_id,
       subquery.parent_id,
       cte_a.summary as succeed_desc,
       subquery.ancestor_summary as parent_desc
FROM (SELECT * FROM cte_b WHERE ancestor_id IS NULL) subquery
LEFT OUTER JOIN cte_a ON subquery.parent_id IS NOT NULL
                     AND subquery.penultimate_ancestor_id = cte_a.parent_id
ORDER BY parent_desc, issue_id

Upvotes: 1

Related Questions