Reputation: 16917
I'm running into an interesting scenario trying to assign an arbitrary FamilyId
to fields that are related to each other.
Here is the structure that we're currently working with:
DataId OriginalDataId
3 1
4 1
5 1
6 1
3 2
4 2
5 2
6 2
7 10
8 10
9 10
11 15
What we're attempting to do is add a FamilyId
column to all DataId
s that have a relationship between each other.
In this case, Id's 3
, 4
, 5
, and 6
have a relationship to 1
. But 3
, 4
, 5
, and 6
also have a relationship with 2
. So 1
, 2
, 3
, 4
, 5
, and 6
should all be considered to be in the same FamilyId
.
7
, 8
, and 9
only have a relationship to 10
, which puts this into a separate FamilyId
. Same for 11
and 15
.
What I am expecting as a result from this are the following results:
DataId FamilyId
1 1
2 1
3 1
4 1
5 1
6 1
7 2
8 2
9 2
10 2
11 3
15 3
Sample data, structure, and queries:
Declare @Results_Stage Table
(
DataId BigInt Not Null,
OriginalDataId BigInt Null
)
Insert @Results_Stage
Values (3,1), (4,1), (5,1), (6,1), (3,2), (4,2), (5,2), (6,2), (7,10), (8, 10), (9, 10), (11, 15)
Select DataId, Row_Number() Over(Partition By DataId Order By OriginalDataId Asc) FamilyId
From @Results_Stage R
Union
Select OriginalDataId, Row_Number() Over(Partition By DataId Order By OriginalDataId Asc) FamilyId
From @Results_Stage
I'm positive my attempt is nowhere near correct, but I'm honestly not sure where to even start on this -- or if it's even possible in SQL Server.
Does anyone have an idea on how to tackle this issue, or at least, something to point me in the right direction?
Edit Below is a query I've come up with so far to identify the other DataId
records that should belong to the same FamilyId
Declare @DataId BigInt = 1
;With Children As
(
Select Distinct X.DataId
From @Results_Stage S
Outer Apply
(
Select Distinct DataId
From @Results_Stage R
Where R.OriginalDataId = S.DataId
Or R.OriginalDataId = S.OriginalDataId
) X
Where S.DataId = @DataId
Or S.OriginalDataId = @DataId
)
Select Distinct O.OriginalDataId
From Children C
Outer Apply
(
Select S.OriginalDataId
From @Results_Stage S
Where S.DataId = C.DataId
) O
Union
Select DataId
From Children
Upvotes: 2
Views: 163
Reputation: 8215
Check this ... it doesn't look too nice but is doing the job :)
DECLARE @T TABLE (DataId INT, OriginalDataId INT)
INSERT INTO @T(DataId , OriginalDataId)
select 3,1
union all select 4,1
union all select 5,1
union all select 6,1
union all select 3,2
union all select 4,2
union all select 5,2
union all select 6,2
union all select 7,10
union all select 8,10
union all select 9,10
union all select 11,15
SELECT * FROM @T
;WITH f AS (
SELECT DISTINCT OriginalDataId FROM @T
)
, m AS (
SELECT DISTINCT
DataId , OriginalDataId = MIN(OriginalDataId)
FROM @T
GROUP BY DataId
)
, m2 AS (
SELECT DISTINCT
x.DataId , x.OriginalDataId
FROM @T AS x
LEFT OUTER JOIN m ON x.DataId = m.DataId AND x.OriginalDataId = m.OriginalDataId
WHERE m.DataId IS NULL
)
, m3 AS (
SELECT DISTINCT DataId = x.OriginalDataId , m.OriginalDataId
FROM m2 AS x
INNER JOIN m ON x.DataId = m.DataId
)
, m4 AS (
SELECT DISTINCT
DataId = OriginalDataId , OriginalDataId
FROM @T
WHERE OriginalDataId NOT IN(SELECT DataId FROM m3)
UNION
SELECT DISTINCT
x.DataId , f.OriginalDataId
FROM f
INNER JOIN m AS x on x.OriginalDataId = f.OriginalDataId
WHERE x.DataId NOT IN(SELECT DataId FROM m3)
UNION
SELECT DataId , OriginalDataId FROM m3
)
, list AS (
SELECT
x.DataId, FamilyId = DENSE_RANK() OVER(ORDER BY x.OriginalDataId )
FROM m4 AS x
)
SELECT * FROM list
-- OUTPUT
DataId FamilyId
1 1
2 1
3 1
4 1
5 1
6 1
7 2
8 2
9 2
10 2
11 3
15 3
Upvotes: 1
Reputation: 72195
The following query, which employs FOR XML PATH
:
SELECT R.OriginalDataId,
STUFF((
SELECT ', ' + + CAST([DataId] AS VARCHAR(MAX))
FROM #Results_Stage
WHERE (OriginalDataId = R.OriginalDataId)
FOR XML PATH(''),TYPE).value('(./text())[1]','VARCHAR(MAX)')
,1,2,'') AS GroupValues
FROM #Results_Stage R
GROUP BY R.OriginalDataId
can be used to produce this output:
OriginalDataId GroupValues
===========================
1 3, 4, 5, 6
2 3, 4, 5, 6
10 7, 8, 9
15 11
Using the above result set, we can easily identify each group and thus have something upon which DENSE_RANK()
can be applied:
;WITH GroupedData AS (
SELECT R.OriginalDataId,
STUFF((
SELECT ', ' + + CAST([DataId] AS VARCHAR(MAX))
FROM #Results_Stage
WHERE (OriginalDataId = R.OriginalDataId)
FOR XML PATH(''),TYPE).value('(./text())[1]','VARCHAR(MAX)')
,1,2,'') AS GroupValues
FROM #Results_Stage R
GROUP BY R.OriginalDataId
), Families AS (
SELECT OriginalDataId, DENSE_RANK() OVER (ORDER BY GroupValues) AS FamilyId
FROM GroupedData
)
SELECT OriginalDataId AS DataId, FamilyId
FROM Families
UNION
SELECT DataId, F.FamilyId
FROM #Results_Stage R
INNER JOIN Families F ON R.OriginalDataId = F.OriginalDataId
ORDER BY FamilyId
Output from above is:
DataId FamilyId
===================
11 1
15 1
1 2
2 2
3 2
4 2
5 2
6 2
7 3
8 3
9 3
10 3
Upvotes: 1