Michael Kunst
Michael Kunst

Reputation: 2988

Get Duplicates efficiently

I'm really stuck at the moment, as my query works, but only with some sample data, in the actual database with 15'000 clients I don't even get a response in time.

I have the following table:

-Client-
id
firstname
lastname

Now my query has to get all possible duplicates and list them, so lets say we have Client A with id 1, B with id 2 and C with id = 3 that have the same first- and lastname. The output should look like this:

id | duplicateID | client | duplicate
1  | 2           | A      | B
1  | 3           | A      | C
2  | 3           | B      | C

My query looks like this:

SELECT
  c.id AS clientID,
  d.id AS duplicateID,
  CONCAT(c.firstname, ' ', c.lastname) AS fullName
FROM Client AS c
JOIN Client AS d
  ON d.lastname = c.lastname
  AND d.firstname = c.firstname
  AND d.id != c.id
  AND d.id > c.id
ORDER BY fullName, c.id

Is there any way to increase the performance without losing any results? I looked into this answer, but there I only get one duplicate per client, where I want all of the duplicates.

Any help or tip is appreciated, thanks

Edit: SQL Fiddle as requested

Upvotes: 0

Views: 48

Answers (1)

Gordon Linoff
Gordon Linoff

Reputation: 1269623

This is your query every so slightly simplified (one on condition is removed).

SELECT c.id AS clientID, d.id AS duplicateID, CONCAT(c.firstname, ' ', c.lastname) AS fullName
FROM Client c JOIN
     Client d
     ON d.lastname = c.lastname AND d.firstname = c.firstname AND d.id > c.id
ORDER BY fullName, c.id;

Try using an index.

create index client_lastname_firstname_id on client(lastname, firstname, id);

This should facilitate the join. If you have a lot of data, the order by could become a performance bottleneck.

Another solution would be to put all the duplicates on one line. This only involves a group by, and gives the list of duplicate ids for each name:

SELECT CONCAT(c.firstname, ' ', c.lastname) AS fullName,
       group_concat(c.id order by c.id) AS clientIDs
 FROM Client c 
GROUP BY c.firstname, c.lastname;

Upvotes: 1

Related Questions