Guigui
Guigui

Reputation: 1115

Row numbers using group by and several columns

I am having some trouble with a SQL request I have to build in order to display data in a grid.

I have that temporary table as below (I put numbers instead of email address for the example):

GroupID | Email1 | Email2 
null    |   1    | 2
null    |   1    | 2
null    |   1    | null
null    |   3    | 1
null    |   2    | 2
null    |   4    | 2
null    |   5    | 6
null    |   6    | null

I need to update the table in order to set the GroupID as described below: If email1 or email2 match any other record, this record needs to have the same groupId as the other. For example (using the table above) :

GroupID | Email1 | Email2
**1**   |   1    | 2
**1**   |   1    | 2
**1**   |   1    | null
**1**   |   3    | 1
**1**   |   2    | 2
**1**   |   4    | 2
**2**   |   5    | 6
**2**   |   6    | null

I tried something like that :

UPDATE a
SET a.GroupId = b.GroupId
FROM   #temp a
INNER JOIN (SELECT Email, 
                   ROW_NUMBER() OVER (ORDER BY ISNULL(Email,'zzzzzzzz')) GroupId
            FROM (SELECT Email1 Email
                  FROM #temp
                  GROUP BY Email1
                  UNION ALL 
                  SELECT Email2 Email   
                  FROM #temp                
                  GROUP BY Email2 
                 ) c
            GROUP BY Email
           ) b
ON a.Email1 = b.Email OR
   a.Email2 = b.Email OR 
   (b.Email IS NULL AND a.Email1 IS NULL AND a.Email2 IS NULL)

But this is not working a I am intending... For example the case where it is the Email2 that is equals to the Email 1 is not recognized as the same group... How can i make this request working as I would like ? Is it even possible ??

[EDIT] 2013/15/17 14:15 : In deed, for the rules, I meant "If email1 or email 2 matches email1 or email2 of any other records the should be of the same groupID"

Upvotes: 3

Views: 255

Answers (1)

ErikE
ErikE

Reputation: 50211

This can't be performed in a single JOIN because there can be very long chains of emails to be traversed, e.g. 1, 2 -> 2, 3 -> 3, 4 -> ... -> 99, 100. (You could possibly use a recursive CTE to do it in a single statement--working around the GROUP BY issues somehow--but you know what I mean.)

Here's one way to do it (SQL Server 2005 and up):

WITH E AS (
   SELECT
      Num = Row_Number() OVER (ORDER BY (SELECT 1)),
      *
   FROM dbo.EmailGroups
)
UPDATE E
SET E.GroupID = E.Num
;

WHILE @@RowCount > 0 BEGIN
   UPDATE E
   SET E.GroupID = X.MinGroupID
   FROM
      dbo.EmailGroups E
      INNER JOIN (
         SELECT
            E1.GroupID,
            MinGroupID = Min(E2.GroupID)
         FROM
            dbo.EmailGroups E1
            INNER JOIN dbo.EmailGroups E2
               ON E1.Email1 IN (E2.Email1, E2.Email2)
               OR E1.Email2 IN (E2.Email1, E2.Email2)
         GROUP BY
            E1.GroupID
         HAVING
            E1.GroupID <> Min(E2.GroupID)
      ) X ON E.GroupID = X.GroupID
   ;
END;

See this working in a SQL Fiddle.

This will result in each chained set of rows having the same GroupID, distinct from all other GroupIDs (but they won't be sequential, there will be gaps). If you need them to be sequential, perform a final update to set the GroupID to DENSE_RANK() OVER (ORDER BY GroupID) -- this is shown in the Fiddle.

Upvotes: 4

Related Questions