Devon Bessemer
Devon Bessemer

Reputation: 35337

What would cause this SQL insert to run very slowly?

I'm using a subquery to find duplicate rows in the system.

SELECT * FROM locations l2 WHERE l2.id IN (
    SELECT min(l1.id) FROM locations l1
    GROUP BY l1.address, l1.city, l1.industry_id
    HAVING count(*) > 1
)

This query executes fairly quickly, around 0.4s. However, if I try wrapping an insert around it:

INSERT INTO duplicate_locations (
    SELECT * FROM locations l2 WHERE l2.id IN (
        SELECT min(l1.id) FROM locations l1
        GROUP BY l1.address, l1.city, l1.industry_id
        HAVING count(*) > 1
    )
)

This query took 6 minutes for 100 rows.

Everything else on the system is running fine. Tables are the same structure, MyISAM, MariaDB 5.5 (no EXPLAIN INSERT). I thought at first it was a table issue, but I tried on a new table and got a similar result so there must be an issue with the query. The disks are fine, it is all CPU load while it is running. I was able to copy the entire locations table in seconds (tens of thousands of rows).

Upvotes: 0

Views: 31

Answers (1)

Uueerdo
Uueerdo

Reputation: 15951

Subqueries in MySQL WHERE clauses are usually performance killers and should be avoiding in all but the simplest cases or in situations where not using them results in an insanely complicated query, try this instead:

INSERT INTO duplicate_locations 
    SELECT l0.* FROM locations AS l0
    INNER JOIN (
        SELECT min(id) AS firstID
        FROM locations
        GROUP BY address, city, industry_id
        HAVING count(*) > 1
    ) AS firstL 
    ON l0.id = firstL.firstID
;

Upvotes: 1

Related Questions