user3676641
user3676641

Reputation: 51

Merge duplicate rows

I have a Customer table which contains an ID and Email field. I've written the following query to return all duplicate Customers with the same Email:

SELECT ID, Email 
FROM Customer a
WHERE EXISTS (SELECT  1
              FROM Customer b
              WHERE a.Email = b.Email
              GROUP BY Email
              HAVING COUNT(Email) = 2)
ORDER BY Email

This is returning records that look like the following:

ID    Email
1     [email protected]
2     [email protected]
3     [email protected]
4     [email protected]

While this works, I actually need the data in the following format:

ID1    Email1          ID2    Email2
1      [email protected]   2      [email protected]
3      [email protected]     4      [email protected]

What is the best way to achieve this?

Upvotes: 0

Views: 70

Answers (3)

Hong Van Vit
Hong Van Vit

Reputation: 2976

Try:

SELECT MIN(ID) ID, Email, MAX(ID) ID2, Email AS EMAIL2
FROM Customer GROUP BY Email

if you want HAVING COUNT(Email) = 2, it will be like this

SELECT MIN(ID) ID, Email, MAX(ID) ID2, Email AS EMAIL2
FROM Customer GROUP BY Email
HAVING COUNT(Email) = 2

Upvotes: 0

stubs
stubs

Reputation: 264

Your layout assumes that you can only have a total of 2 duplicates.

Maybe list the IDs instead like below?

declare @Duplicates table (Email varchar(50), Customers varchar(100))
insert @Duplicates select Email, '' from Customer group by Email having count(*) > 1

UPDATE d
SET
    Customers= STUFF((  SELECT ','+ cast(ID as varchar(10)) 
                        FROM Customer c
                        WHERE c.Email = d.Email            
                        FOR XML PATH(''), TYPE).value('.','VARCHAR(max)'), 1, 1, '')
FROM @Duplicates AS d 

select * from @Duplicates 
order by Email

Upvotes: 0

Gordon Linoff
Gordon Linoff

Reputation: 1269563

One method is conditional aggregation . . . assuming you have at most two emails:

select max(case when seqnum = 1 then id end) as id_1,
       email as email_1,
       max(case when seqnum = 2 then id end) as id_2,
       email as email_2
from (select t.*, row_number() over (partition by email order by id) as seqnum
      from t
      ) t
group by email;

Actually, why not just do:

select email, count(*) as num_dups, min(id) as id_1,
       (case when count(*) > 1 then max(id) end) as id_2
from t
group by email;

Upvotes: 1

Related Questions