codeIsFun
codeIsFun

Reputation: 29

How to find all 'related' parents from a table containing set of (parent, child)?

I have a SQL Server 2005 table as follows:

parent  child
1       a
2       a
2       b
3       b
3       c
4       c
5       d
6       d
7       d
8       e
9       e
9       f
10      f

Each parent can have one or more child, each child can also have one or more parents. How can I find (or group) all parents that are related?

In above case, I want to group:

parent 1, 2, 3, 4 into group 1
parent 5, 6, 7 into group 2
parent 8, 9, 10 into group 3

A result from the query would look like: (Doesn't matter which parent use as group as long as it is from one of those parent in the group, min ID, or max ID would do)

parent  child  parent_group
1       a      1
2       a      1
2       b      1
3       b      1
3       c      1
4       c      1
5       d      5
6       d      5
7       d      5
8       e      8
9       e      8
9       f      8
10      f      8

This is similar to those standard boss - subordinate recursive SQL questions except the subordinate may have more than 1 boss.

Is it possible to use SQL to generate result as above? If so how?

Appreciate any help.

Upvotes: 0

Views: 394

Answers (1)

Markus Jarderot
Markus Jarderot

Reputation: 89171

There is no way to do this in a single query. I found a blog post (with some ruby code) describing how you can calculate connected components in SQL (MySQL flavor).

Based on the article, I wrote the following SQL (SQL Server flavor):

-- Step 1: Create temporary tables
CREATE TABLE #items (
    id int PRIMARY KEY,
    component_id int
);

CREATE TABLE #links (
    first int,
    second int,
    PRIMARY KEY (first, second)
);

CREATE TABLE #components_to_merge (
    component1 int,
    component2 int
    PRIMARY KEY (component1, component2)
);

-- Step 2: Populate tables
INSERT INTO #items (id, component_id)
SELECT DISTINCT parent, parent
FROM children;

INSERT INTO #links (first, second)
SELECT DISTINCT c1.parent, c2.parent
FROM children c1
INNER JOIN children c2 ON c1.child = c2.child
WHERE c1.parent <> c2.parent;

-- Step 3: Merge components
WHILE 1 = 1 BEGIN
    -- Step 3.1: Update #components_to_merge
    TRUNCATE TABLE #components_to_merge;

    INSERT INTO #components_to_merge (component1, component2)
    SELECT DISTINCT t1.component_id, t2.component_id
    FROM #links l
    INNER JOIN #items t1 ON t1.id = l.first
    INNER JOIN #items t2 ON t2.id = l.second
    WHERE t1.component_id <> t2.component_id;

    INSERT INTO #components_to_merge (component1, component2)
    SELECT component2, component1
    FROM #components_to_merge m1
    WHERE component2 NOT IN (
        SELECT m2.component2
        FROM #components_to_merge m2
        WHERE m2.component1 = m1.component1
    );

    IF (SELECT COUNT(*) FROM #components_to_merge) = 0
        BREAK;

    -- Step 3.2: Update #items
    UPDATE i
    SET component_id = target
    FROM #items i
    INNER JOIN (
        SELECT
            component1 AS source,
            MIN(component2) AS target
        FROM #components_to_merge
        GROUP BY component1
    ) new_components
        ON source = component_id
    WHERE target < component_id;
END;

-- Step 4: Generate result
SELECT parent, child, component_id
FROM children
INNER JOIN #items ON id = parent;

You could wrap this up in a stored procedure:

CREATE PROCEDURE CalculateComponents AS
BEGIN
    SET NOCOUNT ON;
    BEGIN TRANSACTION;

    -- SQL code from above

    ROLLBACK TRANSACTION;
END;

And then call it with

EXEC CalculateComponents;

Output:

parent  child   component_id
1       a       1
2       a       1
2       b       1
3       b       1
3       c       1
4       c       1
5       d       5
6       d       5
7       d       5
8       e       8
9       e       8
9       f       8
10      f       8

Upvotes: 1

Related Questions