Reputation: 35
I have a table of customers:
id | name | email
--------------------------
1 | Rob | [email protected]
2 | Jim | [email protected]
3 | Dave | [email protected]
4 | Fred | [email protected]
5 | Ben | [email protected]
6 | Tom | [email protected]
I'm trying to write an SQL query that returns all the rows with duplicate email addresses but... I'd like the query result to return the original ID and the duplicate ID. (The original ID is the first occurrence of the duplicate email.)
The desired result:
original_id | duplicate_id | email
-------------------------------------------
1 | 2 | [email protected]
3 | 5 | [email protected]
3 | 6 | [email protected]
My research so far has indicated it might involve some kind of self join, but I'm stuck on the actual implementation. Can anyone help?
Upvotes: 2
Views: 5630
Reputation: 521492
We could handle this using a join, but I might actually go for an option which generates a CSV list of id corresponding to duplicates:
SELECT
email,
GROUP_CONCAT(id ORDER BY id) AS duplicate_ids
FROM yourTable
GROUP BY email
HAVING COUNT(*) > 1
Functionally speaking, this gives you the same information you wanted in your question, but in what is a much simplified form in my opinion. Because we order the id
values when concatenating, the original id
will always appear first, on the left side of the CSV list. Also, if you have many duplicates your requested output could become verbose and harder to read.
Output:
Upvotes: 7
Reputation: 57391
select
orig.original_id,
t.id as duplicate_id,
orig.email
from t
inner join (select min(id) as original_id, email
from t
group by email
having count(*)>1) orig on orig.email = t.email
having t.id!=orig.original_id
By the subquery we can find all ids for emails with duplicates.
Then we join the subquery by email and for each one use minimal id as original
UPDATE: http://rextester.com/BLIHK20984 cloned @Tim Biegeleisen's answer
Upvotes: 2