Get latest values grouped by two columns - faster

Question

I'm working on an application that needs to get the latest values from a table with currently > 3 million rows and counting. The latest values need to be grouped by two columns/attributes, so it runs the following query:

SELECT 
        m1.type,
        m1.cur,
        ROUND(m1.val, 2) AS val
    FROM minuteCharts m1
    JOIN
        (SELECT
            cur,
            type,
            MAX(id) id,
            ROUND(val) AS val
        FROM minuteCharts
        GROUP BY cur, type) m2
    ON m1.cur = m2.cur AND m1.id = m2.id;

The database server is quite the heavyweight, but the above query takes 3,500ms to complete and this number is rising. I suspect this wasn't a real problem when the application was just launched (as the database was pretty much empty back then), but it's becoming a problem and I haven't found a better solution. In fact, similar questions on SO actually had something like the above as their answers (which is probably where the developer got it from).

Is there anyone out there who knows how to get the same results more efficiently?

UPDATE: I submitted this too early.

EXPLAIN minuteCharts;

Field   Type                              Null  Key     Default    Extra
id      int(255)                          NO    PRI         NULL   auto_increment
time    datetime                          NO    MUL         NULL    
cur     enum('EUR','USD')                 NO                NULL    
type    enum('GOLD','SILVER','PLATINUM')  NO                NULL    
val     varchar(80)                       NO                NULL

id is the primary index and there's an index on time.

Bill Karwin · Accepted Answer

The subquery with GROUP BY is doing a table-scan and a temporary table, because there's no index to support it.

mysql> EXPLAIN SELECT m1.type, m1.cur, ROUND(m1.val, 2) AS val FROM minuteCharts m1 JOIN         (SELECT cur, type, MAX(id) id, ROUND(val) AS val FROM minuteCharts GROUP BY cur, type) m2     ON m1.cur = m2.cur AND m1.id = m2.id;
+----+-------------+--------------+------+---------------+-------------+---------+------------------------+------+---------------------------------+
| id | select_type | table        | type | possible_keys | key         | key_len | ref                    | rows | Extra                           |
+----+-------------+--------------+------+---------------+-------------+---------+------------------------+------+---------------------------------+
|  1 | PRIMARY     | m1           | ALL  | PRIMARY       | NULL        | NULL    | NULL                   |    1 | NULL                            |
|  1 | PRIMARY     |    | ref  |    |  | 6       | test.m1.cur,test.m1.id |    2 | NULL                            |
|  2 | DERIVED     | minuteCharts | ALL  | NULL          | NULL        | NULL    | NULL                   |    1 | Using temporary; Using filesort |
+----+-------------+--------------+------+---------------+-------------+---------+------------------------+------+---------------------------------+

You can improve this with the following index, sorted first by your GROUP BY columns, then also including the other columns for the subquery to make it a covering index:

mysql> ALTER TABLE minuteCharts ADD KEY (cur,type,id,val);

The table-scans turn into index scans (still not great but better), and the temp table goes away.

mysql> EXPLAIN ...
+----+-------------+--------------+-------+---------------+-------------+---------+------------------------+------+-------------+
| id | select_type | table        | type  | possible_keys | key         | key_len | ref                    | rows | Extra       |
+----+-------------+--------------+-------+---------------+-------------+---------+------------------------+------+-------------+
|  1 | PRIMARY     | m1           | index | PRIMARY,cur   | cur         | 88      | NULL                   |    1 | Using index |
|  1 | PRIMARY     |    | ref   |    |  | 6       | test.m1.cur,test.m1.id |    2 | NULL        |
|  2 | DERIVED     | minuteCharts | index | cur           | cur         | 88      | NULL                   |    1 | Using index |
+----+-------------+--------------+-------+---------------+-------------+---------+------------------------+------+-------------+

Best results will be if the index fits in your buffer pool. If it's larger than the buffer pool, the query will have to push pages in and out repeatedly during the index scan, which will greatly degrade performance.

Re your comment:

The answer to how long it'll take to add the index depends on the version of MySQL you have, the storage engine for this table, your server hardware, the number of rows in the table, the level of concurrent load on the database, etc. In other words, I have no way of telling.

I'd suggest using pt-online-schema-change, so you will have no downtime.

Another suggestion would be to try it on a staging server with a clone of your database, so you can get a rough estimate how long it'll take (although testing on an idle server is often a lot quicker than running the same change on a busy server).

Get latest values grouped by two columns - faster

Answers (1)

Related Questions