Neil Wightman
Neil Wightman

Reputation: 1133

Oracle + SQL Server tree query issue

I'm having a little problem getting SQL Server returning the same results as Oracle for our SQL tree query.

We have a link table with attributes

CHILD_ID, LINK_ID, PARENT_ID

and a element table with ELEMENT_ID

In Oracle we use this

SELECT * FROM LINKS
JOIN ELEMENTS ON ELEMENT_ID = CHILD_ID
WHERE LINK_ID IN 
(SELECT LINK_ID FROM LINKS CONNECT BY PRIOR CHILD_ID = PARENT_ID START WITH PARENT_ID = 'startid')

In SQL Server we use this

WITH TREE_LINKS AS
(SELECT CHILD_ID, LINK_ID FROM LINKS WHERE PARENT_ID = 'startid'
UNION ALL
SELECT CURRENT_LINKS.CHILD_ID, CURRENT_LINKS.LINK_ID
FROM LINKS CURRENT_LINKS
INNER JOIN TREE_LINKS t1 ON CURRENT_LINKS.PARENT_ID = t1.CHILD_ID)

SELECT * FROM TREE_LINKS
INNER JOIN LINKS ON TREE_LINKS.LINK_ID = LINKS.LINK_ID 
INNER JOIN ELEMENTS ON ELEMENTS.ELEMENT_ID = TREE_LINKS.CHILD_ID

This works perfectly fine apart from 1 issue.

In Oracle we only get each unique link, based on LINK_ID. In SQL Server we get all the links describing the full tree which can include duplicates when 1 element exists below multiple other elements in different branches of the structure.

This means we get a large amounts of duplicate rows from SQL Server. I tested adding

SELECT DISTINCT TREE_LINKS.LINK_ID AS TREE_LINK_ID, * from TREE_LINKS 

in the last select statement but it just takes as long as the server has more work to remove the duplicates if found in different branches.

In 1 test case at the moment we have Oracle returning 20,000 rows and SQL Server returning 1.6 million rows. So far I have found no way to get SQL Server returning the same results as fast.

FYI : Adding DISTINCT in the recursion causes

DISTINCT operator is not allowed in the recursive part of a recursive common table expression 'TREE_LINKS'.

Edit: - An example

If we have links like this

PARENT_ID, LINK_ID, CHILD_ID
1          1        2
2          2        3 
3          3        4
1          4        3

There are 4 unique elements, but there are 6 elements in the complete tree. This is because there are 2 paths to element 3

Upvotes: 0

Views: 663

Answers (1)

craniumonempty
craniumonempty

Reputation: 3535

SELECT LINK_ID
  FROM LINKS
  START WITH PARENT_ID = 'startid'
  CONNECT BY PRIOR CHILD_ID = PARENT_ID

should be equivalent to

EDIT BELOW (AND in first edit): changed a PARENT_ID to LINK_ID in the second select

WITH TREE_LINKS(CHILD_ID, LINK_ID) AS 
   (SELECT CHILD_ID, LINK_ID
    FROM LINKS
    WHERE PARENT_ID = 'startid'
        UNION ALL
    SELECT NLINKS.CHILD_ID, NLINKS.LINK_ID
    FROM LINKS as NLINKS, TREE_LINKS
    WHERE TREE_LINKS.CHILD_ID = NLINKS.PARENT_ID)

SELECT LINK_ID FROM TREE_LINKS

according to Simulation of CONNECT BY PRIOR of ORACLE in SQL SERVER

The rest should be similar from what I can tell.

EDIT: maybe something like:

WITH TREE_LINKS(CHILD_ID, LINK_ID) AS 
   (SELECT CHILD_ID, LINK_ID
    FROM LINKS
    WHERE PARENT_ID = 'startid'
        UNION ALL
    SELECT NLINKS.CHILD_ID, NLINKS.LINK_ID
    FROM LINKS as NLINKS, TREE_LINKS
    WHERE TREE_LINKS.CHILD_ID = NLINKS.PARENT_ID)

SELECT * FROM LINKS as TLINKS
JOIN ELEMENTS ON ELEMENT_ID = TLINKS.CHILD_ID
WHERE TLINKS.LINK_ID IN
(SELECT TREE_LINKS.LINK_ID FROM TREE_LINKS)

but I don't have mssql or oracle to test it on currently

EDIT: removed "AND TREE_LINKS.LINK_ID <> NLINKS.LINK_ID" from query as it was not needed.

Upvotes: 1

Related Questions