Reputation: 6352
I'm working on an application that needs to get the latest values from a table with currently > 3 million rows and counting. The latest values need to be grouped by two columns/attributes, so it runs the following query:
SELECT
m1.type,
m1.cur,
ROUND(m1.val, 2) AS val
FROM minuteCharts m1
JOIN
(SELECT
cur,
type,
MAX(id) id,
ROUND(val) AS val
FROM minuteCharts
GROUP BY cur, type) m2
ON m1.cur = m2.cur AND m1.id = m2.id;
The database server is quite the heavyweight, but the above query takes 3,500ms to complete and this number is rising. I suspect this wasn't a real problem when the application was just launched (as the database was pretty much empty back then), but it's becoming a problem and I haven't found a better solution. In fact, similar questions on SO actually had something like the above as their answers (which is probably where the developer got it from).
Is there anyone out there who knows how to get the same results more efficiently?
UPDATE: I submitted this too early.
EXPLAIN minuteCharts
;
Field Type Null Key Default Extra
id int(255) NO PRI NULL auto_increment
time datetime NO MUL NULL
cur enum('EUR','USD') NO NULL
type enum('GOLD','SILVER','PLATINUM') NO NULL
val varchar(80) NO NULL
id
is the primary index and there's an index on time
.
Upvotes: 0
Views: 61
Reputation: 562861
The subquery with GROUP BY is doing a table-scan and a temporary table, because there's no index to support it.
mysql> EXPLAIN SELECT m1.type, m1.cur, ROUND(m1.val, 2) AS val FROM minuteCharts m1 JOIN (SELECT cur, type, MAX(id) id, ROUND(val) AS val FROM minuteCharts GROUP BY cur, type) m2 ON m1.cur = m2.cur AND m1.id = m2.id;
+----+-------------+--------------+------+---------------+-------------+---------+------------------------+------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------+------+---------------+-------------+---------+------------------------+------+---------------------------------+
| 1 | PRIMARY | m1 | ALL | PRIMARY | NULL | NULL | NULL | 1 | NULL |
| 1 | PRIMARY | <derived2> | ref | <auto_key0> | <auto_key0> | 6 | test.m1.cur,test.m1.id | 2 | NULL |
| 2 | DERIVED | minuteCharts | ALL | NULL | NULL | NULL | NULL | 1 | Using temporary; Using filesort |
+----+-------------+--------------+------+---------------+-------------+---------+------------------------+------+---------------------------------+
You can improve this with the following index, sorted first by your GROUP BY columns, then also including the other columns for the subquery to make it a covering index:
mysql> ALTER TABLE minuteCharts ADD KEY (cur,type,id,val);
The table-scans turn into index scans (still not great but better), and the temp table goes away.
mysql> EXPLAIN ...
+----+-------------+--------------+-------+---------------+-------------+---------+------------------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------+-------+---------------+-------------+---------+------------------------+------+-------------+
| 1 | PRIMARY | m1 | index | PRIMARY,cur | cur | 88 | NULL | 1 | Using index |
| 1 | PRIMARY | <derived2> | ref | <auto_key0> | <auto_key0> | 6 | test.m1.cur,test.m1.id | 2 | NULL |
| 2 | DERIVED | minuteCharts | index | cur | cur | 88 | NULL | 1 | Using index |
+----+-------------+--------------+-------+---------------+-------------+---------+------------------------+------+-------------+
Best results will be if the index fits in your buffer pool. If it's larger than the buffer pool, the query will have to push pages in and out repeatedly during the index scan, which will greatly degrade performance.
Re your comment:
The answer to how long it'll take to add the index depends on the version of MySQL you have, the storage engine for this table, your server hardware, the number of rows in the table, the level of concurrent load on the database, etc. In other words, I have no way of telling.
I'd suggest using pt-online-schema-change, so you will have no downtime.
Another suggestion would be to try it on a staging server with a clone of your database, so you can get a rough estimate how long it'll take (although testing on an idle server is often a lot quicker than running the same change on a busy server).
Upvotes: 1