Sergey Vorobev
Sergey Vorobev

Reputation: 51

Very slow GROUP BY and ORDER BY

I am trying to create correct indexes, but there is a problem

Here is the table structure:

CREATE TABLE mail (
    id            int                                                           NOT NULL,
    account_id    int                                                               NULL DEFAULT NULL,
    folder_path   varchar(150) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NOT NULL,
    mail_id       int                                                               NULL DEFAULT NULL,
    user_id       int                                                               NULL DEFAULT NULL COMMENT 'текущий менеджер',
    ticket_number varchar(20)  CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NOT NULL,
    from_user_id  int                                                               NULL DEFAULT NULL,
    to_address    longtext     CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci,
    reply_to      longtext     CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci,
    bcc           longtext     CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci,
    cc            longtext     CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci,
    subject       varchar(350) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci     NULL DEFAULT NULL,
    read_status   tinyint(1)                                                    NOT NULL DEFAULT '0',
    created_at    timestamp                                                     NOT NULL,
    updated_at    timestamp                                                         NULL DEFAULT NULL

) ENGINE=InnoDB DEFAULT CHARSET=utf8mb3;
    
ALTER TABLE mail

    ADD KEY from_address      ( from_address(255) ),
    ADD KEY to_address        ( to_address(255) ),
    ADD KEY subject           ( subject ),
    ADD KEY id                ( id ) USING BTREE,
    ADD KEY user_id           ( user_id ),
    ADD KEY created_at        ( created_at ),
    ADD KEY folder_path       ( folder_path, ticket_number, id, created_at ) USING BTREE,
    ADD KEY folder_mail_count ( folder_path, read_status ),
    ADD KEY ticket_number     ( ticket_number, folder_path, id ) USING BTREE;
    
ALTER TABLE mail
    MODIFY id int NOT NULL AUTO_INCREMENT;

With next query, I get a list of emails grouped by ticket and sorted by date created_at:

SELECT
    m1.*,
    m1.mail_at AS receivedAt
FROM
    mail AS m1
    LEFT JOIN mail AS m2 ON
        m1.ticket_number = m2.ticket_number
        AND
        m1.folder_path = m2.folder_path
        AND
        m1.id < m2.id
WHERE
    ( m1.folder_path = 'INBOX' )
    AND
    m2.id IS NULL
GROUP BY
    m1.ticket_number
ORDER BY
    m1.created_at DESC
    LIMIT 20 OFFSET 0;

There are about 200k records in the table. The execution time is 1.3 - 1.5 seconds

+----+-------------+-------+------------+------+------------------------------------------------+-------------+---------+-----------------------------------------------------------+-------+----------+--------------------------------------+
| id | select_type | table | partitions | type | possible_keys                                  | key         | key_len | ref                                                       | rows  | filtered | Extra                                |
+----+-------------+-------+------------+------+------------------------------------------------+-------------+---------+-----------------------------------------------------------+-------+----------+--------------------------------------+
|  1 | SIMPLE      | m1    | NULL       | ref  | folder_path,folder_mail_count,ticket_number    | folder_path | 602     | const                                                     | 47494 |   100.00 | Using temporary; Using filesort      |
|  1 | SIMPLE      | m2    | NULL       | ref  | id,folder_path,folder_mail_count,ticket_number | folder_path | 684     | sandbox_busf.m1.folder_path,sandbox_busf.m1.ticket_number |     1 |    10.00 | Using where; Not exists; Using index |
+----+-------------+-------+------------+------+------------------------------------------------+-------------+---------+-----------------------------------------------------------+-------+----------+--------------------------------------+

If I remove ORDER BY, the query is executed in 0.004 sec.

+----+-------------+-------+------------+------+------------------------------------------------+-------------+---------+-----------------------------------------------------------+-------+----------+--------------------------------------+
| id | select_type | table | partitions | type | possible_keys                                  | key         | key_len | ref                                                       | rows  | filtered | Extra                                |
+----+-------------+-------+------------+------+------------------------------------------------+-------------+---------+-----------------------------------------------------------+-------+----------+--------------------------------------+
|  1 | SIMPLE      | m1    | NULL       | ref  | folder_path,folder_mail_count,ticket_number    | folder_path | 602     | const                                                     | 47494 |   100.00 | NULL                                 |
|  1 | SIMPLE      | m2    | NULL       | ref  | id,folder_path,folder_mail_count,ticket_number | folder_path | 684     | sandbox_busf.m1.folder_path,sandbox_busf.m1.ticket_number |     1 |    10.00 | Using where; Not exists; Using index |
+----+-------------+-------+------------+------+------------------------------------------------+-------------+---------+-----------------------------------------------------------+-------+----------+--------------------------------------+

Please help me create correct indexes.

Upvotes: 0

Views: 81

Answers (1)

Rick James
Rick James

Reputation: 142306

      AND  m1.folder_path = m2.folder_path
      AND  m1.id < m2.id

smells like "group-wise" max as mentioned in the reference manual. It is terribly inefficient. Better choices are in Groupwise-Max

id feels like a PRIMARY KEY, but it is not declared as such. Every table needs a PK.

It's not just "removing the ORDER BY", it is also leaving behind the "LIMIT".

With just a LIMIT, any 20 rows can be delivered; adding in the ORDER BY, requires sorting all the data found by the GROUP BY, and only then peel off 20 rows.

Upvotes: 2

Related Questions