OpenHaus
OpenHaus

Reputation: 107

Need advice optimizing SQL query (update on MySQL)

I did a performance profiling on my database with the slow query log. It turned out this is the number one annoyance:

UPDATE
    t1
SET
  v1t1 =
  (
    SELECT
        t2.v3t2
    FROM
        t2
    WHERE
        t2.v2t2 = t1.v2t1
    AND t2.v1t2 <= '2012-04-24'
    ORDER BY
        t2.v1t2 DESC,
        t2.v3t2 DESC
    LIMIT 1
);

The subquery itself is already slow. I tried variations with DISTINCT, GROUP BY and more subqueries but nothing performed below 4 seconds. For example the following query

SELECT v2t2, v3t2
FROM t2
WHERE t2.v1t2 <= '2012-04-24'
GROUP BY v2t2
ORDER BY v1t2 DESC    

takes:

mysql> SELECT ...
...    
69054 rows in set (5.61 sec)    

mysql> EXPLAIN SELECT ...
+----+-------------+-------------+------+---------------+------+---------+------+---------+----------------------------------------------+
| id | select_type | table       | type | possible_keys | key  | key_len | ref  | rows    | Extra                                        |
+----+-------------+-------------+------+---------------+------+---------+------+---------+----------------------------------------------+
|  1 | SIMPLE      | t2          | ALL  | v1t2          | NULL | NULL    | NULL | 5203965 | Using where; Using temporary; Using filesort |
+----+-------------+-------------+------+---------------+------+---------+------+---------+----------------------------------------------+

mysql> SHOW CREATE TABLE t2;
...
  PRIMARY KEY (`v3t2`),
  KEY `v1t2_v3t2` (`v1t2`,`v3t2`),
  KEY `v1t2` (`v1t2`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8  

SELECT COUNT(*) FROM t1;
+----------+
| COUNT(*) |
+----------+
|    77070 |
+----------+

SELECT COUNT(*) FROM t2;
+----------+
| COUNT(*) |
+----------+
|  5203965 |
+----------+

I am trying to fetch the newest entry (v3t2) and its parent (v2t2). Should not be that big of a deal, should it? Does anyone have any advice which knobs I should turn? Any help or hint is greatly appreciated!

This should be a more appropriate SELECT statement:

SELECT
    t1.v2t1,
  (
    SELECT
        t2.v3t2
    FROM
        t2
    WHERE
        t2.v2t2 = t1.v2t1
    AND t2.v1t2 <= '2012-04-24'
    ORDER BY
        t2.v1t2 DESC,
        t2.v3t2 DESC
    LIMIT 1
) AS latest   
FROM
    t1

Upvotes: 2

Views: 141

Answers (3)

0b10011
0b10011

Reputation: 18795

Does this work any better? Gets rid of one of the sorts and groups by the key being used.

UPDATE
    t1
SET
  v1t1 =
  (
    SELECT
        MAX(t2.v3t2)
    FROM
        t2
    WHERE
        t2.v2t2 = t1.v2t1
    AND t2.v1t2 <= '2012-04-24'
    GROUP BY t2.v1t2
    ORDER BY t2.v1t2 DESC
    LIMIT 1
);

Alternate Version

UPDATE `t1`
SET `v1t1` = (
  SELECT MAX(`t2`.`v3t2`)
  FROM `t2`
  WHERE `t2`.`v2t2` = `t1`.`v2t1`
  AND `t2`.`v1t2` = (
    SELECT MAX(`t2`.`v1t2`)
    FROM `t2`
    WHERE `t2`.`v2t2` = `t1`.`v2t1
    AND `t2`.`v1t2` <= '2012-04-24'
    LIMIT 1
  )
  LIMIT 1
);

And add this index to t2:

KEY `v2t2_v1t2` (`v2t2`, `v1t2`)

Upvotes: 0

vyegorov
vyegorov

Reputation: 22875

Your ORDER BY ... LIMIT 1 is forcing database to perform a full scan of the table to return only 1 row. It looks like very much as a candidate for indexing.

Before you build the index, check the fileds selectivity by running:

SELECT count(*), count(v1t2), count(DISTINCT v1t2) FROM t2;

If you're having high number of non-NULL values in your column and number of distinct values is more then 40% of the non-NULLs, then building index is a good thing to go.

If index provides no help, you should analyze the data in your columns. You're using t2.v1t2 <= '2012-04-24' condition, which, in the case you have a historical set of records in your table, will give nothing to the planner, as all rows are expected to be in the past, thus full scan is the best choice anyway. Thus, indexe is useless.

What you should do instead, is think how to rewrite your query in a way, that only a limited subset of records is checked. Your construct ORDER BY ... DESC LIMIT 1 shows that you probably want the most recent entry up to '2012-04-24' (including). Why don't you try to rewrite your query to a something like:

SELECT v2t2, v3t2
FROM t2
WHERE t2.v1t2 => date_add('2012-04-24' interval '-10' DAY)
GROUP BY v2t2
ORDER BY v1t2 DESC;

This is just an example, knowing the design of your database and nature of your data more precise query can be built.

Upvotes: 1

Kris.Mitchell
Kris.Mitchell

Reputation: 998

I would take a look at indexes that are built for the sub-select t2. You should have a index for v2t2 and possibly one for v1t2, and v3t2 because of the ordering. The index should reduce the time the sub select has to go looking for the results before using them in your update query.

Upvotes: 0

Related Questions