SQL Server Create Grouping For Related Records

Question

I'm running into an interesting scenario trying to assign an arbitrary FamilyId to fields that are related to each other.

Here is the structure that we're currently working with:

DataId  OriginalDataId
3       1
4       1
5       1
6       1
3       2
4       2
5       2
6       2
7       10
8       10
9       10
11      15

What we're attempting to do is add a FamilyId column to all DataIds that have a relationship between each other.

In this case, Id's 3, 4, 5, and 6 have a relationship to 1. But 3, 4, 5, and 6 also have a relationship with 2. So 1, 2, 3, 4, 5, and 6 should all be considered to be in the same FamilyId.

7, 8, and 9 only have a relationship to 10, which puts this into a separate FamilyId. Same for 11 and 15.

What I am expecting as a result from this are the following results:

DataId  FamilyId
1       1
2       1
3       1
4       1
5       1
6       1
7       2
8       2
9       2
10      2
11      3
15      3

Sample data, structure, and queries:

Declare @Results_Stage Table
(
    DataId          BigInt Not Null,
    OriginalDataId  BigInt Null
)


Insert @Results_Stage
Values (3,1), (4,1), (5,1), (6,1), (3,2), (4,2), (5,2), (6,2), (7,10), (8, 10), (9, 10), (11, 15)


Select DataId, Row_Number() Over(Partition By DataId Order By OriginalDataId Asc) FamilyId
From   @Results_Stage       R
Union
Select OriginalDataId, Row_Number() Over(Partition By DataId Order By OriginalDataId Asc) FamilyId
From   @Results_Stage

I'm positive my attempt is nowhere near correct, but I'm honestly not sure where to even start on this -- or if it's even possible in SQL Server.

Does anyone have an idea on how to tackle this issue, or at least, something to point me in the right direction?

Edit Below is a query I've come up with so far to identify the other DataId records that should belong to the same FamilyId

Declare @DataId BigInt = 1

;With Children As
(
    Select      Distinct X.DataId
    From        @Results_Stage  S
    Outer Apply
    (
        Select  Distinct DataId
        From    @Results_Stage  R
        Where   R.OriginalDataId = S.DataId
        Or      R.OriginalDataId = S.OriginalDataId
    ) X
    Where   S.DataId = @DataId
    Or      S.OriginalDataId = @DataId
)
Select  Distinct O.OriginalDataId
From    Children    C
Outer Apply
(
    Select  S.OriginalDataId
    From    @Results_Stage  S
    Where   S.DataId = C.DataId
) O
Union 
Select  DataId
From    Children

Giorgos Betsos · Accepted Answer

The following query, which employs FOR XML PATH:

SELECT R.OriginalDataId,
          STUFF((
             SELECT ', ' + + CAST([DataId] AS VARCHAR(MAX)) 
             FROM #Results_Stage 
             WHERE (OriginalDataId = R.OriginalDataId) 
             FOR XML PATH(''),TYPE).value('(./text())[1]','VARCHAR(MAX)')
          ,1,2,'') AS GroupValues
   FROM #Results_Stage R
   GROUP BY R.OriginalDataId

can be used to produce this output:

OriginalDataId  GroupValues
===========================
1               3, 4, 5, 6
2               3, 4, 5, 6
10              7, 8, 9
15              11

Using the above result set, we can easily identify each group and thus have something upon which DENSE_RANK() can be applied:

;WITH GroupedData AS (
   SELECT R.OriginalDataId,
          STUFF((
             SELECT ', ' + + CAST([DataId] AS VARCHAR(MAX)) 
             FROM #Results_Stage 
             WHERE (OriginalDataId = R.OriginalDataId) 
             FOR XML PATH(''),TYPE).value('(./text())[1]','VARCHAR(MAX)')
          ,1,2,'') AS GroupValues
   FROM #Results_Stage R
   GROUP BY R.OriginalDataId
), Families AS (
   SELECT OriginalDataId, DENSE_RANK() OVER (ORDER BY GroupValues) AS FamilyId
   FROM GroupedData 
)
SELECT OriginalDataId AS DataId, FamilyId  
FROM Families

UNION 

SELECT DataId, F.FamilyId
FROM #Results_Stage R
INNER JOIN Families F ON R.OriginalDataId = F.OriginalDataId

ORDER BY FamilyId

Output from above is:

  DataId    FamilyId
   ===================
    11      1
    15      1
    1       2
    2       2
    3       2
    4       2
    5       2
    6       2
    7       3
    8       3
    9       3
    10      3

SQL Server Create Grouping For Related Records

Answers (2)

Related Questions