fussmonkey
fussmonkey

Reputation: 678

SQL recursive logic

I have a situation where I need to configure existing client data to address a problem where our application was not correctly updating IDs in a table when it should have been.

Here's the scenario. We have a parent table, where rows can be inserted that effectively replace existing rows; the replacement can be recursive. We also have a child table, which has a field that points to the parent table. In existing data, the child table could be pointing at rows that have been replaced, and I need to correct that. I can't simply update each row to the replacing row, however, because that row could have been replaced as well, and I need the latest row to be reflected.

I was trying to find a way to write a CTE that would accomplish this for me, but I'm struggling to find a query that finds what I'm actually looking for. Here's a sample of the tables that I'm working with; the 'ShouldBe' column is what I'd like my update query to end up with, taking into account the recursive replacement of some of the rows.

DECLARE @parent TABLE (SampleID int, 
                   SampleIDReplace int,
                   GroupID char(1))

INSERT INTO @parent (SampleID, SampleIDReplace, GroupID)
VALUES (1, -1, 'A'), (2, 1, 'A'), (3, -1, 'A'), 
       (4, -1, 'A'), (5, 4, 'A'), (6, 5, 'A'),
       (7, -1, 'B'), (8, 7, 'B'), (9, 8, 'B')


DECLARE @child TABLE (ChildID int, ParentID int)
INSERT INTO @child (ChildID, ParentID)
VALUES (1, 4), (2, 7), (3, 1), (4, 3)

Desired results in child table, after the update script has been applied:

ChildID     ParentID    ParentID_ShouldBe
1           4           6 (4 replaced by 5, 5 replaced by 6)
2           7           9 (7 replaced by 8, 8 replaced by 9)
3           1           2 (1 replaced by 2)
4           3           3 (unchanged, never replaced)

Upvotes: 6

Views: 988

Answers (3)

BrianC
BrianC

Reputation: 314

I've got a iterative SQL loop that I think sorts this out as follows:

WHILE EXISTS (SELECT * FROM #child C INNER JOIN #parent P ON C.ParentID = P.SampleIDReplace WHERE P.SampleIDReplace > -1)
BEGIN
    UPDATE #child
    SET ParentID = SampleID
    FROM #parent 
    WHERE #child.ParentID = SampleIDReplace
END

Basically, the while condition compares the contents of the parent ID column in the child table and sees if there is a matching value in the SampleIDReplace column of the parent table. If there is, it goes and gets the SampleID of that record. It only stops when the join results in every SampleIDReplace being -1, meaning we have nothing else to do.

On your sample data, the above results in the expected output.

Note that I had to use temp tables rather than table variables here in order for the table to be accessible within the loop. If you have to use table variables then there would need to be a bit more surgery done.

Clearly if you have deep replacement hierarchies then you'll do quite a few updates, which may be a consideration when looking to perform the query against a production database.

Upvotes: 0

Lamak
Lamak

Reputation: 70638

Ok, so it took me a while and there are probably better ways to do it, but here is one option.

DECLARE @parent TABLE (SampleID int, 
                   SampleIDReplace int,
                   GroupID char(1))

INSERT INTO @parent (SampleID, SampleIDReplace, GroupID)
VALUES (1, -1, 'A'), (2, 1, 'A'), (3, -1, 'A'), 
       (4, -1, 'A'), (5, 4, 'A'), (6, 5, 'A'),
       (7, -1, 'B'), (8, 7, 'B'), (9, 8, 'B')


DECLARE @child TABLE (ChildID int, ParentID int)
INSERT INTO @child (ChildID, ParentID)
VALUES (1, 4), (2, 7), (3, 1), (4, 3)


;WITH RecursiveParent1 AS
(
    SELECT SampleIDReplace, SampleID, 1 RecursionLevel
    FROM @parent
    WHERE SampleIDReplace != -1
    UNION ALL
    SELECT A.SampleIDReplace, B.SampleID, RecursionLevel + 1
    FROM RecursiveParent1 A
    INNER JOIN @parent B
        ON A.SampleId = B.SampleIDReplace
),RecursiveParent2 AS
(
    SELECT  *, 
            ROW_NUMBER() OVER(PARTITION BY SampleIdReplace ORDER BY RecursionLevel DESC) RN
    FROM RecursiveParent1
)
SELECT A.ChildID, ISNULL(B.ParentID,A.ParentID) ParentID
FROM @child A
LEFT JOIN ( SELECT SampleIDReplace, SampleID ParentID 
            FROM RecursiveParent2
            WHERE RN = 1) B
    ON A.ParentID = B.SampleIDReplace
OPTION(MAXRECURSION 500)

Upvotes: 2

Gordon Linoff
Gordon Linoff

Reputation: 1269633

The following returns what you are looking for:

with cte as (
    select sampleid, sampleidreplace, 1 as num
    from @parent
    where sampleidreplace <> -1
    union all
    select p.sampleid, cte.sampleidreplace, cte.num+1
    from @parent p join
         cte
         on p.sampleidreplace = cte.sampleId
)
select c.*, coalesce(p.sampleid, c.parentid)
from @child c left outer join
     (select ROW_NUMBER() over (partition by sampleidreplace order by num desc) as seqnum, *
      from cte
     ) p
     on c.ParentID = p.SampleIDReplace and p.seqnum = 1

The recursive part keeps track of every correspondence (4-->5, 4-->6). The addition number is a "generation" count. We actually want the last generation. This is identified by using the row_number() function, ordering by the num in decreasing order -- hence the p.seqnum = 1.

Upvotes: 4

Related Questions