Reputation: 2988
I'm really stuck at the moment, as my query works, but only with some sample data, in the actual database with 15'000 clients I don't even get a response in time.
I have the following table:
-Client-
id
firstname
lastname
Now my query has to get all possible duplicates and list them, so lets say we have Client A with id 1, B with id 2 and C with id = 3 that have the same first- and lastname. The output should look like this:
id | duplicateID | client | duplicate
1 | 2 | A | B
1 | 3 | A | C
2 | 3 | B | C
My query looks like this:
SELECT
c.id AS clientID,
d.id AS duplicateID,
CONCAT(c.firstname, ' ', c.lastname) AS fullName
FROM Client AS c
JOIN Client AS d
ON d.lastname = c.lastname
AND d.firstname = c.firstname
AND d.id != c.id
AND d.id > c.id
ORDER BY fullName, c.id
Is there any way to increase the performance without losing any results? I looked into this answer, but there I only get one duplicate per client, where I want all of the duplicates.
Any help or tip is appreciated, thanks
Edit: SQL Fiddle as requested
Upvotes: 0
Views: 48
Reputation: 1269623
This is your query every so slightly simplified (one on
condition is removed).
SELECT c.id AS clientID, d.id AS duplicateID, CONCAT(c.firstname, ' ', c.lastname) AS fullName
FROM Client c JOIN
Client d
ON d.lastname = c.lastname AND d.firstname = c.firstname AND d.id > c.id
ORDER BY fullName, c.id;
Try using an index.
create index client_lastname_firstname_id on client(lastname, firstname, id);
This should facilitate the join
. If you have a lot of data, the order by
could become a performance bottleneck.
Another solution would be to put all the duplicates on one line. This only involves a group by
, and gives the list of duplicate ids for each name:
SELECT CONCAT(c.firstname, ' ', c.lastname) AS fullName,
group_concat(c.id order by c.id) AS clientIDs
FROM Client c
GROUP BY c.firstname, c.lastname;
Upvotes: 1