Reputation: 815
I am using MySQL 8.0 and there is a slow query on a large table to be optimized.
The table contains 11 million rows of data and it's structure:
CREATE TABLE `ccu` (
`id` bigint NOT NULL,
`app_id` int NOT NULL,
`ccu` int NOT NULL,
`audit_create` datetime NOT NULL,
PRIMARY KEY (`id`) USING BTREE,
UNIQUE KEY `ccu_game_create_time_2a10bc69_idx` (`app_id`,`audit_create`) USING BTREE,
KEY `ccu_audit_create_idx` (`audit_create`) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci
My query is:
SELECT app_id, DATE(audit_create) cal_day, MAX(ccu) pcu, ROUND(AVG(ccu)) id_acu
FROM ccu
WHERE audit_create BETWEEN DATE_SUB(DATE(NOW()), INTERVAL 29 DAY) AND DATE(NOW())
GROUP BY app_id, DATE(audit_create)
The query runs over 2 seconds. I add the condition by between ... and ...
to filter useful data.
However, the data stored in audit_create
is in format yyyy-MM-dd HH:mm:ss
, I have to use the date
function but according to the execution plan only the where
condition uses index(still has temporary table), the group by
clause does not use any index at all.
I have no right to alter the table structre to add a date column. Is it possible to optimize the query to lower the query time?
Upvotes: 2
Views: 87
Reputation: 142298
Trivial improvement: DATE(NOW())
--> CURDATE()
Main improvement:
Get rid of id
and change
PRIMARY KEY (`id`) USING BTREE,
UNIQUE KEY `ccu_game_create_time_2a10bc69_idx` (`app_id`,`audit_create`) USING BTREE,
to just
PRIMARY KEY (`app_id`,`audit_create`),
That avoids a secondary lookup for each row.
There seem to be 2.4M rows (out of 11M) in the 29 day range. The Optimizer had to decide whether to use the index (which it did), but suffer 2.4M extra lookups, versus scan all 11M rows, necessitating an extra sort.
Another thing to check is innodb_buffer_pool_size
. If the table is so big that it won't fit in that cache, there may be a lot of I/O. (Again, my index change will help with that.)
Yes, Bill's generated column is likely to add more performance, independently of my suggestion.
Caution:
Your range is 29 days + 1 second.
Bill's range is 30 days.
Regardless of the datatype of audit_create
, this works to get exactly 29 days before this morning:
WHERE audit_create >= CURDATE() - INTERVAL 29 DAY
AND audit_create < CURDATE()
Upvotes: 2
Reputation: 562310
I was able to eliminate the Using temporary
by adding an expression index:
mysql> alter table ccu add key bk1 (app_id, (cast(audit_create as date)));
Query OK, 0 rows affected (0.02 sec)
Records: 0 Duplicates: 0 Warnings: 0
mysql> explain SELECT app_id, DATE(audit_create) cal_day,
MAX(ccu) pcu, ROUND(AVG(ccu)) id_acu
FROM ccu
WHERE date(audit_create) BETWEEN DATE_SUB(DATE(NOW()), INTERVAL 29 DAY) AND DATE(NOW())
GROUP BY app_id, cast(audit_create as date)\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: ccu
partitions: NULL
type: index
possible_keys: bk1
key: bk1
key_len: 8
ref: NULL
rows: 1
filtered: 100.00
Extra: Using where
Unfortunately, that EXPLAIN report shows it will use type: index which is an index-scan, in other words it will examine every one of the 11 million index entries. It could make it worse than it was in your query.
The only other suggestion I have is to run this query once a day and store the result in a summary table. Running a 2-second query once a day so you can get the aggregate results quickly should be acceptable. But you said you don't have authority to add a column, so I guess you don't have authority to add a table either.
In that case, get a faster computer with more RAM.
Upvotes: 3