Reputation: 107
I did a performance profiling on my database with the slow query log. It turned out this is the number one annoyance:
UPDATE
t1
SET
v1t1 =
(
SELECT
t2.v3t2
FROM
t2
WHERE
t2.v2t2 = t1.v2t1
AND t2.v1t2 <= '2012-04-24'
ORDER BY
t2.v1t2 DESC,
t2.v3t2 DESC
LIMIT 1
);
The subquery itself is already slow. I tried variations with DISTINCT, GROUP BY and more subqueries but nothing performed below 4 seconds. For example the following query
SELECT v2t2, v3t2
FROM t2
WHERE t2.v1t2 <= '2012-04-24'
GROUP BY v2t2
ORDER BY v1t2 DESC
takes:
mysql> SELECT ...
...
69054 rows in set (5.61 sec)
mysql> EXPLAIN SELECT ...
+----+-------------+-------------+------+---------------+------+---------+------+---------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------------+------+---------------+------+---------+------+---------+----------------------------------------------+
| 1 | SIMPLE | t2 | ALL | v1t2 | NULL | NULL | NULL | 5203965 | Using where; Using temporary; Using filesort |
+----+-------------+-------------+------+---------------+------+---------+------+---------+----------------------------------------------+
mysql> SHOW CREATE TABLE t2;
...
PRIMARY KEY (`v3t2`),
KEY `v1t2_v3t2` (`v1t2`,`v3t2`),
KEY `v1t2` (`v1t2`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8
SELECT COUNT(*) FROM t1;
+----------+
| COUNT(*) |
+----------+
| 77070 |
+----------+
SELECT COUNT(*) FROM t2;
+----------+
| COUNT(*) |
+----------+
| 5203965 |
+----------+
I am trying to fetch the newest entry (v3t2) and its parent (v2t2). Should not be that big of a deal, should it? Does anyone have any advice which knobs I should turn? Any help or hint is greatly appreciated!
This should be a more appropriate SELECT statement:
SELECT
t1.v2t1,
(
SELECT
t2.v3t2
FROM
t2
WHERE
t2.v2t2 = t1.v2t1
AND t2.v1t2 <= '2012-04-24'
ORDER BY
t2.v1t2 DESC,
t2.v3t2 DESC
LIMIT 1
) AS latest
FROM
t1
Upvotes: 2
Views: 141
Reputation: 18795
Does this work any better? Gets rid of one of the sorts and groups by the key being used.
UPDATE
t1
SET
v1t1 =
(
SELECT
MAX(t2.v3t2)
FROM
t2
WHERE
t2.v2t2 = t1.v2t1
AND t2.v1t2 <= '2012-04-24'
GROUP BY t2.v1t2
ORDER BY t2.v1t2 DESC
LIMIT 1
);
UPDATE `t1`
SET `v1t1` = (
SELECT MAX(`t2`.`v3t2`)
FROM `t2`
WHERE `t2`.`v2t2` = `t1`.`v2t1`
AND `t2`.`v1t2` = (
SELECT MAX(`t2`.`v1t2`)
FROM `t2`
WHERE `t2`.`v2t2` = `t1`.`v2t1
AND `t2`.`v1t2` <= '2012-04-24'
LIMIT 1
)
LIMIT 1
);
And add this index to t2
:
KEY `v2t2_v1t2` (`v2t2`, `v1t2`)
Upvotes: 0
Reputation: 22875
Your ORDER BY ... LIMIT 1
is forcing database to perform a full scan of the table to return only 1 row. It looks like very much as a candidate for indexing.
Before you build the index, check the fileds selectivity by running:
SELECT count(*), count(v1t2), count(DISTINCT v1t2) FROM t2;
If you're having high number of non-NULL
values in your column and number of distinct values is more then 40% of the non-NULL
s, then building index is a good thing to go.
If index provides no help, you should analyze the data in your columns. You're using t2.v1t2 <= '2012-04-24'
condition, which, in the case you have a historical set of records in your table, will give nothing to the planner, as all rows are expected to be in the past, thus full scan is the best choice anyway. Thus, indexe is useless.
What you should do instead, is think how to rewrite your query in a way, that only a limited subset of records is checked. Your construct ORDER BY ... DESC LIMIT 1
shows that you probably want the most recent entry up to '2012-04-24'
(including). Why don't you try to rewrite your query to a something like:
SELECT v2t2, v3t2
FROM t2
WHERE t2.v1t2 => date_add('2012-04-24' interval '-10' DAY)
GROUP BY v2t2
ORDER BY v1t2 DESC;
This is just an example, knowing the design of your database and nature of your data more precise query can be built.
Upvotes: 1
Reputation: 998
I would take a look at indexes that are built for the sub-select t2. You should have a index for v2t2 and possibly one for v1t2, and v3t2 because of the ordering. The index should reduce the time the sub select has to go looking for the results before using them in your update query.
Upvotes: 0