Reputation: 69
I have a table that now has about 1m rows. The following query need about 5 seconds to complete. What you suggest optimizing query speed?
# Thread_id: 14 Schema: defrop_defrop QC_hit: No
# Query_time: 5.573048 Lock_time: 0.591625 Rows_sent: 0 Rows_examined: 1006391
# Rows_affected: 1
UPDATE `backlinks` as a
INNER JOIN(SELECT b.`id` as bid
FROM `backlinks` b
WHERE b.`googlebot_id` IS NULL AND b.`used_time` IS NULL AND
b.`campaign_id` IN (SELECT `id` FROM `campaigns` WHERE `status`=true) GROUP BY b.`campaign_id` ORDER BY RAND() limit 1
) as c
ON (a.id = c.bid)
SET a.`crawler_id` = '10.0.0.13', a.`used_time`=NOW();
campaign_id, googlebot_id are foreing keys, indexers. used_time and crawler_id are indexers Screenshot from the table phpmyadmin
Upvotes: 0
Views: 88
Reputation: 1269753
This is the query formatted so I can read it better:
UPDATE backlinks bl JOIN
(SELECT bl2.id as bid
FROM backlinks bl2
WHERE bl2.googlebot_id IS NULL AND
bl2.used_time IS NULL AND
bl2.campaign_id IN (SELECT c.id FROM campaigns c WHERE status = true)
GROUP BY b.campaign_id
ORDER BY RAND()
LIMIT 1
) bl2
ON bl.id = bl2.bid
SET bl.crawler_id = '10.0.0.13',
bl.used_time = NOW();
First, the GROUP BY
in the subquery is not needed. And I would replace the IN
with EXISTS
:
UPDATE backlinks bl JOIN
(SELECT bl2.id as bid
FROM backlinks bl2
WHERE bl2.googlebot_id IS NULL AND
bl2.used_time IS NULL AND
EXISTS (SELECT 1 FROM campaigns c WHERE bl2.campaign_id = c.id AND c.status = true)
ORDER BY RAND()
LIMIT 1
) bl2
ON bl.id = bl2.bid
SET bl.crawler_id = '10.0.0.13',
bl.used_time = NOW();
That will help a little, but probably not much. My guess is that the performance issue is the size of the outer sort (or equivalently, the size of the data needed for the GROUP BY
in your query).
You can also get rid of the subquery entirely:
UPDATE backlinks bl
SET bl.crawler_id = '10.0.0.13',
bl.used_time = NOW()
WHERE bl.googlebot_id IS NULL AND
bl.used_time IS NULL AND
EXISTS (SELECT 1 FROM campaigns c WHERE bl.campaign_id = c.id AND c.status = true)
ORDER BY RAND()
LIMIT 1;
This has minimal impact, but it cleans up the logic a bit.
My guess is that the WHERE
conditions are not very selective, so optimizing them won't help much.
At this point, the issue is the ORDER BY RAND()
. If you have any idea how many rows are being returned by the subquery, then you can pre-filter using RAND()
. For instance, let me assume that there are at least 1,000 rows being returned. Then:
UPDATE backlinks bl
SET bl.crawler_id = '10.0.0.13',
bl.used_time = NOW()
WHERE bl.googlebot_id IS NULL AND
bl.used_time IS NULL AND
EXISTS (SELECT 1 FROM campaigns c WHERE bl.campaign_id = c.id AND c.status = true) AND
RAND() < 0.01 -- keep about 1/100
ORDER BY RAND()
LIMIT 1;
This significantly speeds the sort, because it is on 100th the data. However, it could filter out all rows -- if not enough rows match the conditions.
Upvotes: 2