Raul
Raul

Reputation: 607

Mysql optimization issue

Any guess why this statement is taking too long when handling 300 000 rows. Basically this query is meant for finding duplicates.

SELECT DISTINCT 
    a.Id,
    b.Id as sid
FROM 
    csv_temp a
INNER JOIN 
    csv_temp b ON a.firstname = b.firstname AND 
    a.lastname = b.lastname  AND 
   ((a.address = b.address) OR 
    (a.zip = b.zip) OR 
    (a.city = b.city AND a.state = b.state) )
WHERE 
    a.Id <> b.Id AND 
    a.status=2 AND 
    b.status=1 AND 
    a.flag !=1 AND 
    b.flag !=1

enter image description here

Upvotes: 0

Views: 65

Answers (2)

Kickstart
Kickstart

Reputation: 21513

ORs often seem to have poor performance, and on JOIN conditions I would expect that to be worse. Try having 3 SELECTs (one for each of the ORed conditions) and UNION the results together. Suspect the DISTINCTS are not required either if this is done:-

SELECT  
    a.Id,
    b.Id as sid
FROM 
    csv_temp a
INNER JOIN 
    csv_temp b ON a.firstname = b.firstname AND 
    a.lastname = b.lastname  AND 
   a.address = b.address
WHERE 
    a.Id <> b.Id AND 
    a.status=2 AND 
    b.status=1 AND 
    a.flag !=1 AND 
    b.flag !=1
UNION
SELECT  
    a.Id,
    b.Id as sid
FROM 
    csv_temp a
INNER JOIN 
    csv_temp b ON a.firstname = b.firstname AND 
    a.lastname = b.lastname  AND 
    a.zip = b.zip
WHERE 
    a.Id <> b.Id AND 
    a.status=2 AND 
    b.status=1 AND 
    a.flag !=1 AND 
    b.flag !=1
UNION
SELECT  
    a.Id,
    b.Id as sid
FROM 
    csv_temp a
INNER JOIN 
    csv_temp b ON a.firstname = b.firstname AND 
    a.lastname = b.lastname  AND 
    a.city = b.city AND a.state = b.state
WHERE 
    a.Id <> b.Id AND 
    a.status=2 AND 
    b.status=1 AND 
    a.flag !=1 AND 
    b.flag !=1

Upvotes: 3

user2001117
user2001117

Reputation: 3777

Now check using the Explain Plan after adding indexes on the column which are using in compare

Upvotes: 0

Related Questions