Reputation: 13
I have a log table, but I find it become very slow when I sort it.
Here's my database table structure in short.
CREATE TABLE `webhook_logs` (
`ID` bigint(20) UNSIGNED NOT NULL,
`event_id` bigint(20) UNSIGNED DEFAULT NULL,
`object_id` bigint(20) UNSIGNED DEFAULT NULL,
`occurred_at` bigint(20) UNSIGNED DEFAULT NULL,
`payload` text COLLATE utf8mb4_unicode_520_ci,
`priority` bigint(1) UNSIGNED DEFAULT NULL,
`status` varchar(32) COLLATE utf8mb4_unicode_520_ci DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_520_ci;
ALTER TABLE `webhook_logs`
ADD PRIMARY KEY (`ID`),
ADD KEY `event_id` (`event_id`),
ADD KEY `object_id` (`object_id`),
ADD KEY `occurred_at` (`occurred_at`),
ADD KEY `priority` (`priority`),
ADD KEY `status` (`status`);
There are 5M + records.
When I do
SELECT * FROM `webhook_logs` WHERE status = 'pending' AND occurred_at < 1652838913000 ORDER BY priority ASC LIMIT 100
, it took about 5 seconds to get the records.
However, when i remove the sorting, and just do
SELECT * FROM `webhook_logs` WHERE status = 'pending' AND occurred_at < 1652838913000 LIMIT 100
, it took only 0.0022 seconds.
I've been playing around with the index and see if the time improved, but with no luck. I wonder if I'm doing something wrong here.
I tried creating combo index with "occurred_at" and "priority", or combo index with all "occurred_at", "priority" and "status". None of them improved the speed, still take around 5 seconds. If any help, there server is running MYSQL 5.7.12.
Any help will be appropriated. Thanks.
Upvotes: 0
Views: 133
Reputation: 142298
You don't need BIGINT
for most of those columns. That datatype takes 8 bytes. There are much smaller datatypes. priority
could be TINYINT UNSIGNED
(1 byte, range of 0..255). status
could be changed to a 1-byte ENUM
. Such changes will shrink the data and index sizes, hence speed up most operations somewhat.
Replace INDEX(status)
with
INDEX(status, occurred_at, priority, id) -- in this order
Then your query will run somewhat faster, depending on the distribution of the data.
This might run even faster:
SELECT w.*
FROM (
SELECT id
FROM `webhook_logs`
WHERE status = 'pending'
AND occurred_at < 1652838913000
ORDER BY priority ASC
LIMIT 100
) AS t
JOIN webhook_logs USING(id)
ORDER BY priority ASC -- yes, this is repeated
;
That is because it can pick the 100 ids from the my index much faster since it is "covering", then do 100 lookups to get "*".
Upvotes: 0
Reputation: 21
Pure index can't solve your problem. In your query, the DB must first find out all records where "occurred_at < 1652838913000" and then sort them to get the records with highest priority. No index can help to reduce the sort.
But there are solutions to your problem, because priority always has only serveral values. You can create an index(status, priority, occurred_at), and then write a query like this:
select * from (
(SELECT * FROM `webhook_logs` WHERE status = 'pending' and priority=1 AND occurred_at < 1652838913000 LIMIT 100)
union
(SELECT * FROM `webhook_logs` WHERE status = 'pending' and priority=2 AND occurred_at < 1652838913000 LIMIT 100)
) a ORDER BY priority asc LIMIT 100
In this query, DB will use the index to do each sub query of the union, and then sort only very few rows. The result can be returned in less than 0.1 seconds
Upvotes: 2