SQL query to find duplicate rows and return both IDs

Question

I have a table of customers:

id | name | email
--------------------------
 1 | Rob  | spam@email.com
 2 | Jim  | spam@email.com
 3 | Dave | ham@email.com
 4 | Fred | eggs@email.com
 5 | Ben  | ham@email.com
 6 | Tom  | ham@email.com

I'm trying to write an SQL query that returns all the rows with duplicate email addresses but... I'd like the query result to return the original ID and the duplicate ID. (The original ID is the first occurrence of the duplicate email.)

The desired result:

original_id | duplicate_id | email
-------------------------------------------
          1 |            2 | spam@email.com
          3 |            5 | ham@email.com
          3 |            6 | ham@email.com

My research so far has indicated it might involve some kind of self join, but I'm stuck on the actual implementation. Can anyone help?

StanislavL · Accepted Answer

select
  orig.original_id,
  t.id as duplicate_id,
  orig.email
from t
  inner join (select min(id) as original_id, email
              from t
              group by email
              having count(*)>1) orig on orig.email = t.email
having t.id!=orig.original_id

By the subquery we can find all ids for emails with duplicates.

Then we join the subquery by email and for each one use minimal id as original

UPDATE: http://rextester.com/BLIHK20984 cloned @Tim Biegeleisen's answer

SQL query to find duplicate rows and return both IDs

Answers (2)

Demo

Related Questions