Gummi
Gummi

Reputation: 13

How should i properly index the mysql column when dealing with sort?

I have a log table, but I find it become very slow when I sort it.

Here's my database table structure in short.

CREATE TABLE `webhook_logs` (
  `ID` bigint(20) UNSIGNED NOT NULL,
  `event_id` bigint(20) UNSIGNED DEFAULT NULL,
  `object_id` bigint(20) UNSIGNED DEFAULT NULL,
  `occurred_at` bigint(20) UNSIGNED DEFAULT NULL,
  `payload` text COLLATE utf8mb4_unicode_520_ci,
  `priority` bigint(1) UNSIGNED DEFAULT NULL,
  `status` varchar(32) COLLATE utf8mb4_unicode_520_ci DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_520_ci;

ALTER TABLE `webhook_logs`
  ADD PRIMARY KEY (`ID`),
  ADD KEY `event_id` (`event_id`),
  ADD KEY `object_id` (`object_id`),
  ADD KEY `occurred_at` (`occurred_at`),
  ADD KEY `priority` (`priority`),
  ADD KEY `status` (`status`);

There are 5M + records.

When I do

SELECT * FROM `webhook_logs` WHERE status = 'pending' AND occurred_at < 1652838913000 ORDER BY priority ASC LIMIT 100

, it took about 5 seconds to get the records.


However, when i remove the sorting, and just do

SELECT * FROM `webhook_logs` WHERE status = 'pending' AND occurred_at < 1652838913000 LIMIT 100

, it took only 0.0022 seconds.


I've been playing around with the index and see if the time improved, but with no luck. I wonder if I'm doing something wrong here.

I tried creating combo index with "occurred_at" and "priority", or combo index with all "occurred_at", "priority" and "status". None of them improved the speed, still take around 5 seconds. If any help, there server is running MYSQL 5.7.12.

Any help will be appropriated. Thanks.

Upvotes: 0

Views: 133

Answers (2)

Rick James
Rick James

Reputation: 142298

You don't need BIGINT for most of those columns. That datatype takes 8 bytes. There are much smaller datatypes. priority could be TINYINT UNSIGNED (1 byte, range of 0..255). status could be changed to a 1-byte ENUM. Such changes will shrink the data and index sizes, hence speed up most operations somewhat.

Replace INDEX(status) with

INDEX(status, occurred_at, priority, id)  -- in this order

Then your query will run somewhat faster, depending on the distribution of the data.

This might run even faster:

SELECT  w.*
    FROM  (
        SELECT id
            FROM `webhook_logs`
            WHERE  status = 'pending'
              AND  occurred_at < 1652838913000
            ORDER BY  priority ASC
            LIMIT  100
          ) AS t
    JOIN webhook_logs  USING(id)
    ORDER BY priority ASC    -- yes, this is repeated
    ;

That is because it can pick the 100 ids from the my index much faster since it is "covering", then do 100 lookups to get "*".

Upvotes: 0

Donald
Donald

Reputation: 21

Pure index can't solve your problem. In your query, the DB must first find out all records where "occurred_at < 1652838913000" and then sort them to get the records with highest priority. No index can help to reduce the sort.

But there are solutions to your problem, because priority always has only serveral values. You can create an index(status, priority, occurred_at), and then write a query like this:

select * from (
(SELECT * FROM `webhook_logs` WHERE status = 'pending' and priority=1 AND occurred_at < 1652838913000 LIMIT 100)
union
(SELECT * FROM `webhook_logs` WHERE status = 'pending' and priority=2 AND occurred_at < 1652838913000 LIMIT 100)
) a ORDER BY priority asc LIMIT 100

In this query, DB will use the index to do each sub query of the union, and then sort only very few rows. The result can be returned in less than 0.1 seconds

Upvotes: 2

Related Questions