Reputation: 1115
I am having some trouble with a SQL request I have to build in order to display data in a grid.
I have that temporary table as below (I put numbers instead of email address for the example):
GroupID | Email1 | Email2
null | 1 | 2
null | 1 | 2
null | 1 | null
null | 3 | 1
null | 2 | 2
null | 4 | 2
null | 5 | 6
null | 6 | null
I need to update the table in order to set the GroupID as described below: If email1 or email2 match any other record, this record needs to have the same groupId as the other. For example (using the table above) :
GroupID | Email1 | Email2
**1** | 1 | 2
**1** | 1 | 2
**1** | 1 | null
**1** | 3 | 1
**1** | 2 | 2
**1** | 4 | 2
**2** | 5 | 6
**2** | 6 | null
I tried something like that :
UPDATE a
SET a.GroupId = b.GroupId
FROM #temp a
INNER JOIN (SELECT Email,
ROW_NUMBER() OVER (ORDER BY ISNULL(Email,'zzzzzzzz')) GroupId
FROM (SELECT Email1 Email
FROM #temp
GROUP BY Email1
UNION ALL
SELECT Email2 Email
FROM #temp
GROUP BY Email2
) c
GROUP BY Email
) b
ON a.Email1 = b.Email OR
a.Email2 = b.Email OR
(b.Email IS NULL AND a.Email1 IS NULL AND a.Email2 IS NULL)
But this is not working a I am intending... For example the case where it is the Email2 that is equals to the Email 1 is not recognized as the same group... How can i make this request working as I would like ? Is it even possible ??
[EDIT] 2013/15/17 14:15 : In deed, for the rules, I meant "If email1 or email 2 matches email1 or email2 of any other records the should be of the same groupID"
Upvotes: 3
Views: 255
Reputation: 50211
This can't be performed in a single JOIN
because there can be very long chains of emails to be traversed, e.g. 1, 2
-> 2, 3
-> 3, 4
-> ... -> 99, 100
. (You could possibly use a recursive CTE to do it in a single statement--working around the GROUP BY
issues somehow--but you know what I mean.)
Here's one way to do it (SQL Server 2005 and up):
WITH E AS (
SELECT
Num = Row_Number() OVER (ORDER BY (SELECT 1)),
*
FROM dbo.EmailGroups
)
UPDATE E
SET E.GroupID = E.Num
;
WHILE @@RowCount > 0 BEGIN
UPDATE E
SET E.GroupID = X.MinGroupID
FROM
dbo.EmailGroups E
INNER JOIN (
SELECT
E1.GroupID,
MinGroupID = Min(E2.GroupID)
FROM
dbo.EmailGroups E1
INNER JOIN dbo.EmailGroups E2
ON E1.Email1 IN (E2.Email1, E2.Email2)
OR E1.Email2 IN (E2.Email1, E2.Email2)
GROUP BY
E1.GroupID
HAVING
E1.GroupID <> Min(E2.GroupID)
) X ON E.GroupID = X.GroupID
;
END;
See this working in a SQL Fiddle.
This will result in each chained set of rows having the same GroupID
, distinct from all other GroupIDs
(but they won't be sequential, there will be gaps). If you need them to be sequential, perform a final update to set the GroupID
to DENSE_RANK() OVER (ORDER BY GroupID)
-- this is shown in the Fiddle.
Upvotes: 4