Reputation: 9
We are working on a table with roughly 13 million rows. Our goal is to look for duplicates within this table for only one restaurant (~300,000 rows). Our criteria for duplicates are same last name, same first 2 letters of first name, and same phone or email. Each of these are their own column. Our strategy for now is to create two identical, temporary tables for all rows from the restaurant and then to join them on the criteria above and then return id, first name, last name, phone, and email from the first table.
SELECT
DISTINCT t1.id, t1.firstname, t1.lastname, t1.phone, t1.email
FROM
(
SELECT lmoc.id, lmoc.firstname, lmoc.lastname, lmoc.phone, lmoc.email
FROM loyalty_member_opentable_customer lmoc
WHERE lmoc.opentable_restaurant_id=2296
AND lmoc.lastname NOT LIKE '%Tour%'
) AS t1
INNER JOIN
(
SELECT lmoc2.id, lmoc2.firstname, lmoc2.lastname, lmoc2.phone, lmoc2.email
FROM loyalty_member_opentable_customer lmoc2
WHERE lmoc2.opentable_restaurant_id=2296
AND lmoc2.lastname NOT LIKE '%Tour%'
) AS t2
ON STRCMP(t1.lastname,t2.lastname)=0
AND t1.id!=t2.id
AND STRCMP(LEFT(t1.firstname,2),LEFT(t2.firstname,2))=0
AND (STRCMP(t1.phone,t2.phone)=0 OR STRCMP(t1.email,t2.email)=0)
ORDER BY t1.lastname, t1.firstname
The issue is that this query takes north of 48 hours to run. Can anybody think of a more efficient way to run this? We need all of the duplicates so that the restaurant could combine them as they see fit.
Upvotes: 0
Views: 79
Reputation: 3618
You are not creating a temporary table but using subqueries and that will be slow with 13 million rows. Create a single real temporary table with all the data you need (SELECT INTO
).
This is what I'd try :
/* Creating a temporary table */
SELECT lmoc.id, lmoc.firstname, lmoc.lastname, lmoc.phone, lmoc.email
INTO tempRestaurant
FROM loyalty_member_opentable_customer AS lmoc
WHERE
lmoc.opentable_restaurant_id=2296 AND
lmoc.lastname NOT LIKE '%Tour%'
/* Select duplicates */
SELECT * FROM
tempRestaurant AS t1
INNER JOIN tempRestaurant AS t2 ON
STRCMP(t1.lastname,t2.lastname)=0
AND t1.id!=t2.id
WHERE
STRCMP(LEFT(t1.firstname,2), LEFT(t2.firstname,2))=0 AND
( STRCMP(t1.phone,t2.phone)=0 OR STRCMP(t1.email,t2.email)=0 )
Upvotes: 1
Reputation: 11
This SQL will help you find the duplicates
SELECT lmoc.id, lmoc.firstname, lmoc.lastname, lmoc.phone, lmoc.email
FROM loyalty_member_opentable_customer lmoc
WHERE lmoc.opentable_restaurant_id=2296
AND lmoc.lastname NOT LIKE '%Tour%'
AND lmoc.lastname BETWEEN 'ha' AND 'i'
GROUP BY lmoc.opentable_restaurant_id, lmoc.id, LEFT(lmoc.firstname,2), lmoc.lastname, lmoc.phone, lmoc.email
HAVING COUNT(*) > 1
If you got a primary key, they you can easily keep the recent one and purge the old ones, with this SQL
DELETE
lmoc.primary_id
FROM loyalty_member_opentable_customer lmoc
LEFT JOIN
(SELECT
MAX(lmoc.primary_id) AS id
FROM loyalty_member_opentable_customer lmoc
WHERE lmoc.opentable_restaurant_id=2296
AND lmoc.lastname NOT LIKE '%Tour%'
AND lmoc.lastname BETWEEN 'ha' AND 'i'
GROUP BY lmoc.opentable_restaurant_id, lmoc.id, LEFT(lmoc.firstname,2), lmoc.lastname, lmoc.phone, lmoc.email
) nodup
ON adjuster.id = nodup.id
WHERE lmoc.opentable_restaurant_id=2296
AND lmoc.lastname NOT LIKE '%Tour%'
AND lmoc.lastname BETWEEN 'ha' AND 'i'
AND nodup.id IS NULL";
Upvotes: 1
Reputation: 51888
Why not simply do
SELECT lmoc.lastname, lmoc.firstname, lmoc.phone, lmoc.email
FROM loyalty_member_opentable_customer lmoc
WHERE lmoc.opentable_restaurant_id=2296
AND lmoc.lastname NOT LIKE '%Tour%'
GROUP BY lmoc.lastname, LEFT(lmoc.firstname, 2), lmoc.phone, lmoc.email
HAVING COUNT(*) > 1;
?
Upvotes: 1