Rob Stanford
Rob Stanford

Reputation: 35

SQL query to find duplicate rows and return both IDs

I have a table of customers:

id | name | email
--------------------------
 1 | Rob  | [email protected]
 2 | Jim  | [email protected]
 3 | Dave | [email protected]
 4 | Fred | [email protected]
 5 | Ben  | [email protected]
 6 | Tom  | [email protected]

I'm trying to write an SQL query that returns all the rows with duplicate email addresses but... I'd like the query result to return the original ID and the duplicate ID. (The original ID is the first occurrence of the duplicate email.)

The desired result:

original_id | duplicate_id | email
-------------------------------------------
          1 |            2 | [email protected]
          3 |            5 | [email protected]
          3 |            6 | [email protected]

My research so far has indicated it might involve some kind of self join, but I'm stuck on the actual implementation. Can anyone help?

Upvotes: 2

Views: 5630

Answers (2)

Tim Biegeleisen
Tim Biegeleisen

Reputation: 521492

We could handle this using a join, but I might actually go for an option which generates a CSV list of id corresponding to duplicates:

SELECT
    email,
    GROUP_CONCAT(id ORDER BY id) AS duplicate_ids
FROM yourTable
GROUP BY email
HAVING COUNT(*) > 1

Functionally speaking, this gives you the same information you wanted in your question, but in what is a much simplified form in my opinion. Because we order the id values when concatenating, the original id will always appear first, on the left side of the CSV list. Also, if you have many duplicates your requested output could become verbose and harder to read.

Output:

enter image description here

Demo

Upvotes: 7

StanislavL
StanislavL

Reputation: 57391

select
  orig.original_id,
  t.id as duplicate_id,
  orig.email
from t
  inner join (select min(id) as original_id, email
              from t
              group by email
              having count(*)>1) orig on orig.email = t.email
having t.id!=orig.original_id

By the subquery we can find all ids for emails with duplicates.

Then we join the subquery by email and for each one use minimal id as original

UPDATE: http://rextester.com/BLIHK20984 cloned @Tim Biegeleisen's answer

Upvotes: 2

Related Questions