Reputation: 35337
I'm using a subquery to find duplicate rows in the system.
SELECT * FROM locations l2 WHERE l2.id IN (
SELECT min(l1.id) FROM locations l1
GROUP BY l1.address, l1.city, l1.industry_id
HAVING count(*) > 1
)
This query executes fairly quickly, around 0.4s. However, if I try wrapping an insert around it:
INSERT INTO duplicate_locations (
SELECT * FROM locations l2 WHERE l2.id IN (
SELECT min(l1.id) FROM locations l1
GROUP BY l1.address, l1.city, l1.industry_id
HAVING count(*) > 1
)
)
This query took 6 minutes for 100 rows.
Everything else on the system is running fine. Tables are the same structure, MyISAM, MariaDB 5.5 (no EXPLAIN INSERT). I thought at first it was a table issue, but I tried on a new table and got a similar result so there must be an issue with the query. The disks are fine, it is all CPU load while it is running. I was able to copy the entire locations table in seconds (tens of thousands of rows).
Upvotes: 0
Views: 31
Reputation: 15951
Subqueries in MySQL WHERE clauses are usually performance killers and should be avoiding in all but the simplest cases or in situations where not using them results in an insanely complicated query, try this instead:
INSERT INTO duplicate_locations
SELECT l0.* FROM locations AS l0
INNER JOIN (
SELECT min(id) AS firstID
FROM locations
GROUP BY address, city, industry_id
HAVING count(*) > 1
) AS firstL
ON l0.id = firstL.firstID
;
Upvotes: 1