Find only self referencing values using recursive query

Question

Table:

create table tbl_nodess
(
    nod1 varchar(50),
    nod2 varchar(50)
);

Records:

insert into tbl_nodess values('A','B'); 
insert into tbl_nodess values('B','C');
insert into tbl_nodess values('C','B');
insert into tbl_nodess values('D','E');
insert into tbl_nodess values('E','F');
insert into tbl_nodess values('G','H');
insert into tbl_nodess values('B','D');
insert into tbl_nodess values('D','A');
insert into tbl_nodess values('I','J');
insert into tbl_nodess values('J','K');
insert into tbl_nodess values('K','L');
insert into tbl_nodess values('L','J');

Expected Result:

self_ref_nodes 
------------------
A-B-C-B 
A-B-D-A
I-J-K-L-J

Explanation about expected result: I want to find only those nodes which are self referencing.

Example:

node A referring B, B referring C, C referring B

The node C again referring to node B.

node A referring B, B referring D, D referring A

The node D again referring to node A.

node I referring J, J referring K, K referring L, L referring J

The node L again referring to node J(which is already in the chain).

Query:

;with cte AS (
      select nod1, nod2, 
             convert(varchar(max), ('-'+ nod1+ '-'+ nod2+ '-')) as nodes, 1 as lev
      from tbl_nodess n
      where not exists(select 1 from tbl_nodess n1 where n.nod1 = n1.nod2)
      union all
      select cte.nod1, n.nod2, 
             convert(varchar(max), (cte.nodes+ n.nod2+ '-')) as nodes, lev + 1
      from cte join
           tbl_nodess n
           on cte.nod2 = n.nod1
      where nodes not like ('%-'+ n.nod2+ '-%') 
     )
select nodes
from cte
where not exists 
(select 1
from cte cte2
where cte2.nodes like (cte.nodes+ '_%'))

Richard Hansell · Accepted Answer

I had a crack at this, but I had to make some major assumptions. First I stuck your data into a table variable:

DECLARE @tbl_nodess TABLE (nod1 VARCHAR(50), nod2 VARCHAR(50));
INSERT INTO @tbl_nodess VALUES('A','B'); 
INSERT INTO @tbl_nodess VALUES('B','C');
INSERT INTO @tbl_nodess VALUES('C','B');
INSERT INTO @tbl_nodess VALUES('D','E');
INSERT INTO @tbl_nodess VALUES('E','F');
INSERT INTO @tbl_nodess VALUES('G','H');
INSERT INTO @tbl_nodess VALUES('B','D');
INSERT INTO @tbl_nodess VALUES('D','A');
INSERT INTO @tbl_nodess VALUES('I','J');
INSERT INTO @tbl_nodess VALUES('J','K');
INSERT INTO @tbl_nodess VALUES('K','L');
INSERT INTO @tbl_nodess VALUES('L','J');

Then I wrote a recursive query:

WITH x AS (
    SELECT
        t.nod1,
        t.nod2,
        CONVERT(VARCHAR(500), t.nod1 + '-' + t.nod2) AS accumulator,
        0 AS done
    FROM
        @tbl_nodess t
    UNION ALL
    SELECT
        t.nod1,
        t.nod2,
        CONVERT(VARCHAR(500), x.accumulator + '-' + t.nod2) AS accumulator,
        CASE WHEN CHARINDEX(t.nod2, x.accumulator) != 0 THEN 1 ELSE 0 END AS done
    FROM
        x
        INNER JOIN @tbl_nodess t ON t.nod1 = x.nod2
    WHERE
        x.done = 0)
SELECT
    accumulator AS self_ref_nodes
FROM
    x
WHERE
    done = 1;

So how does this work?

I have the recursive query, using the "X - Y -> Y - Z" rule. I also accumulate the chain, starting with nod1 - nod2, then adding any "new" nod2 when I find a match.

If I hit a loop then I want to stop, but I don't want to stop immediately. I want to stop on the next loop, so I use the "done" to handle this. I set it, and then stop when it's set to 1 on the next run through the recursion.

This prevents circular references, but if there were more elements then you might eventually hit the 100 limit.

It's also worth noting that I find a LOT of self-referencing loops, more than in your example:

L-J-K-L
K-L-J-K
J-K-L-J
I-J-K-L-J
D-A-B-D
D-A-B-C-B
B-D-A-B
C-B-C
C-B-D-A-B
B-C-B
A-B-D-A
A-B-C-B

Find only self referencing values using recursive query

Answers (1)

Related Questions