user1589188
user1589188

Reputation: 5736

Recursive CTE with both child and parent links

Hi I have a table that stores the mappings of my data, e.g.

Data                  Map
id | letter | type    l | r
------------------    -----
 1 |   AA   | HEAD    5 | 1
 2 |   BB   | HEAD    2 | 1
 3 |   CC   | HEAD    6 | 2
 4 |   DD   | HEAD    3 | 2
 5 |  END-1 |  END    7 | 3
 6 |  END-2 |  END    8 | 4
 7 |  END-3 |  END
 8 |  END-4 |  END

See http://sqlfiddle.com/#!3/4eccfe/5

I want to find out all the END type links from a given source, e.g. for AA, I get END-1, END-2, END-3; for BB, I get END-1, END-2, END-3; for CC, I get END-1, END-2, END-3; for DD, I get END-4

I have written what I want using recursive CTE:

;WITH data(id, letter, type) AS (
    SELECT '1', 'AA', 'HEAD' UNION SELECT '2', 'BB', 'HEAD' UNION SELECT '3', 'CC', 'HEAD' UNION
    SELECT '4', 'DD', 'HEAD' UNION SELECT '5', 'END-1', 'END' UNION SELECT '6', 'END-2', 'END' UNION
    SELECT '7', 'END-3', 'END' UNION SELECT '8', 'END-4', 'END'
), map (l, r) AS (
    SELECT '5', '1' UNION SELECT '2', '1' UNION
    SELECT '6', '2' UNION SELECT '3', '2' UNION
    SELECT '7', '3' UNION SELECT '8', '4'
), my_list (origin, source, target, target_type, sid, tid, level) AS (
    SELECT s.letter, s.letter, t.letter, t.type, s.id, t.id, 0
    FROM data s JOIN map ON (s.id = l OR s.id = r)
    JOIN data t ON (t.id = l OR t.id = r)
    WHERE t.id <> s.id AND s.type <> 'END'
    UNION ALL
    SELECT my_list.origin, s.letter, t.letter, t.type, s.id, t.id, level + 1
    FROM data s JOIN map ON (s.id = l OR s.id = r)
    JOIN data t ON (t.id = l OR t.id = r) JOIN my_list ON s.id = my_list.tid
    WHERE t.id <> s.id AND s.type <> 'END' AND t.id <> my_list.sid
)
SELECT * FROM my_list
WHERE origin = 'BB' AND target_type = 'END'
ORDER BY level
GO

But the performance is not really good (on my real tables). Then I realise it is the ORs in the join conditions that are causing the issue, then I tried using UNION

my_list (origin, source, target, target_type, sid, tid, level) AS (
    SELECT s.letter, s.letter, t.letter, t.type, s.id, t.id, 0
    FROM data s JOIN map ON s.id = l JOIN data t ON t.id = r
    WHERE s.type <> 'END'
    UNION ALL
    SELECT s.letter, s.letter, t.letter, t.type, s.id, t.id, 0
    FROM data s JOIN map ON s.id = r JOIN data t ON t.id = l
    WHERE s.type <> 'END'
    UNION ALL
    SELECT my_list.origin, s.letter, t.letter, t.type, s.id, t.id, level + 1
    FROM data s JOIN map ON (s.id = l OR s.id = r)
    JOIN data t ON (t.id = l OR t.id = r) JOIN my_list ON s.id = my_list.tid
    WHERE t.id <> s.id AND s.type <> 'END' AND t.id <> my_list.sid
)

The difference is huge (on my real tables, the time is halved). For the above examples, I got

Table 'Worktable'. Scan count 6, logical reads 100

vs

Table 'Worktable'. Scan count 5, logical reads 75

But then when I tried to do the same for the recursive part, e.g.

my_list (origin, source, target, target_type, sid, tid, level) AS (
    SELECT s.letter, s.letter, t.letter, t.type, s.id, t.id, 0
    FROM data s JOIN map ON s.id = l JOIN data t ON t.id = r
    WHERE s.type <> 'END'
    UNION ALL
    SELECT s.letter, s.letter, t.letter, t.type, s.id, t.id, 0
    FROM data s JOIN map ON s.id = r JOIN data t ON t.id = l
    WHERE s.type <> 'END'
    UNION ALL
    SELECT my_list.origin, s.letter, t.letter, t.type, s.id, t.id, level + 1
    FROM data s JOIN map ON s.id = l
    JOIN data t ON t.id = r JOIN my_list ON s.id = my_list.tid
    WHERE s.type <> 'END' AND t.id <> my_list.sid
    UNION ALL
    SELECT my_list.origin, s.letter, t.letter, t.type, s.id, t.id, level + 1
    FROM data s JOIN map ON s.id = r
    JOIN data t ON t.id = l JOIN my_list ON s.id = my_list.tid
    WHERE s.type <> 'END' AND t.id <> my_list.sid
)

the result becomes slower (on my real tables, 5 times slower).

I wonder why is it slower and if there is other way to get rid of the ORs to speed up the query? Database is MS SQL SERVER 2008R2

Thank you

Upvotes: 2

Views: 166

Answers (1)

Lennart - Slava Ukraini
Lennart - Slava Ukraini

Reputation: 7181

I might have gotten this wrong, but can't you push the predicate:

WHERE origin = 'BB'

inside the CTE. I.e.:

;WITH my_list (origin, source, target, target_type, sid, tid, level) AS (
SELECT s.letter, s.letter, t.letter, t.type, s.id, t.id, 0
FROM data s 
    JOIN map 
        ON s.id = l 
    JOIN data t 
        ON t.id = r
WHERE s.letter = 'BB'
UNION ALL
SELECT s.letter, s.letter, t.letter, t.type, s.id, t.id, 0
FROM data s 
    JOIN map 
        ON s.id = r 
    JOIN data t 
        ON t.id = l
WHERE s.letter = 'BB'
UNION ALL
SELECT my_list.origin, s.letter, t.letter, t.type, s.id, t.id, level + 1 
    FROM data s 
    JOIN map 
        ON (s.id = l OR s.id = r)
JOIN data t 
        ON (t.id = l OR t.id = r) 
    JOIN my_list 
        ON s.id = my_list.tid
WHERE t.id <> s.id 
      AND s.type <> 'END' 
      AND t.id <> my_list.sid
)
SELECT * FROM my_list
WHERE origin = 'BB' AND target_type = 'END'
ORDER BY level

Does that improve performance?

Upvotes: 1

Related Questions