Reputation: 29
I have a SQL Server 2005 table as follows:
parent child
1 a
2 a
2 b
3 b
3 c
4 c
5 d
6 d
7 d
8 e
9 e
9 f
10 f
Each parent can have one or more child, each child can also have one or more parents. How can I find (or group) all parents that are related?
In above case, I want to group:
parent 1, 2, 3, 4 into group 1
parent 5, 6, 7 into group 2
parent 8, 9, 10 into group 3
A result from the query would look like: (Doesn't matter which parent use as group as long as it is from one of those parent in the group, min ID, or max ID would do)
parent child parent_group
1 a 1
2 a 1
2 b 1
3 b 1
3 c 1
4 c 1
5 d 5
6 d 5
7 d 5
8 e 8
9 e 8
9 f 8
10 f 8
This is similar to those standard boss - subordinate recursive SQL questions except the subordinate may have more than 1 boss.
Is it possible to use SQL to generate result as above? If so how?
Appreciate any help.
Upvotes: 0
Views: 394
Reputation: 89171
There is no way to do this in a single query. I found a blog post (with some ruby code) describing how you can calculate connected components in SQL (MySQL flavor).
Based on the article, I wrote the following SQL (SQL Server flavor):
-- Step 1: Create temporary tables
CREATE TABLE #items (
id int PRIMARY KEY,
component_id int
);
CREATE TABLE #links (
first int,
second int,
PRIMARY KEY (first, second)
);
CREATE TABLE #components_to_merge (
component1 int,
component2 int
PRIMARY KEY (component1, component2)
);
-- Step 2: Populate tables
INSERT INTO #items (id, component_id)
SELECT DISTINCT parent, parent
FROM children;
INSERT INTO #links (first, second)
SELECT DISTINCT c1.parent, c2.parent
FROM children c1
INNER JOIN children c2 ON c1.child = c2.child
WHERE c1.parent <> c2.parent;
-- Step 3: Merge components
WHILE 1 = 1 BEGIN
-- Step 3.1: Update #components_to_merge
TRUNCATE TABLE #components_to_merge;
INSERT INTO #components_to_merge (component1, component2)
SELECT DISTINCT t1.component_id, t2.component_id
FROM #links l
INNER JOIN #items t1 ON t1.id = l.first
INNER JOIN #items t2 ON t2.id = l.second
WHERE t1.component_id <> t2.component_id;
INSERT INTO #components_to_merge (component1, component2)
SELECT component2, component1
FROM #components_to_merge m1
WHERE component2 NOT IN (
SELECT m2.component2
FROM #components_to_merge m2
WHERE m2.component1 = m1.component1
);
IF (SELECT COUNT(*) FROM #components_to_merge) = 0
BREAK;
-- Step 3.2: Update #items
UPDATE i
SET component_id = target
FROM #items i
INNER JOIN (
SELECT
component1 AS source,
MIN(component2) AS target
FROM #components_to_merge
GROUP BY component1
) new_components
ON source = component_id
WHERE target < component_id;
END;
-- Step 4: Generate result
SELECT parent, child, component_id
FROM children
INNER JOIN #items ON id = parent;
You could wrap this up in a stored procedure:
CREATE PROCEDURE CalculateComponents AS
BEGIN
SET NOCOUNT ON;
BEGIN TRANSACTION;
-- SQL code from above
ROLLBACK TRANSACTION;
END;
And then call it with
EXEC CalculateComponents;
Output:
parent child component_id
1 a 1
2 a 1
2 b 1
3 b 1
3 c 1
4 c 1
5 d 5
6 d 5
7 d 5
8 e 8
9 e 8
9 f 8
10 f 8
Upvotes: 1