Reputation: 1571

how to optimize sql when find the max record of each group while table is large?

I have a table which contains nearly 1 million+ records. I want to find the max record of each group. Here is my sql:

SELECT * 
FROM t 
WHERE id IN (SELECT max(id) AS id 
             FROM t 
             WHERE a = 'some' AND b = 0 
             GROUP BY c, d);

Table declares as follow.

CREATE TABLE `t` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT COMMENT 'id',
  `a` varchar(32) NOT NULL COMMENT 'a',
  `b` tinyint(3) unsigned NOT NULL COMMENT 'b',
  `c` bigint(20) unsigned NOT NULL COMMENT 'c',
  `d` varchar(32) NOT NULL COMMENT 'd',
  PRIMARY KEY (`id`),
  KEY `idx_c_d` (`c`,`d`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COMMENT='test table';

I have a union index on c and d. So the second statement(SELECT max(id) AS id FROM t WHERE a = 'some' AND b = 0 GROUP BY c, d) execute in 200ms. But the total statement cost nearly 6 seconds(The result contains 5000 rows). Here is the explain shows(some columns are omitted).

+-------------+-------+-------+---------------+--------+---------+----------+--------------------------+
| select_type | table | type  | possible_keys |  key   |  rows   | filtered |          Extra           |
+-------------+-------+-------+---------------+--------+---------+----------+--------------------------+
| PRIMARY     | t     | ALL   | NULL          | NULL   | 9926024 |   100.00 | Using where              |
| SUBQUERY    | t     | index | idx_1         | idex_1 | 9926024 |     1.00 | Using where; Using index |
+-------------+-------+-------+---------------+--------+---------+----------+--------------------------+

Upvotes: 1

Answers (4)

Gordon Linoff

Reputation: 1269643

I recommend using a correlated subquery:

SELECT t.* 
FROM t 
WHERE t.id IN (SELECT MAX(t2.id)
            FROM t t2
            WHERE t2.c = t.c AND t2.d = t.d AND
                  t2.a = 'some' AND t2.b = 0
           );

This assumes that id is unique in the table.

For performance, you want an index on (c, d, a, b, id).

Upvotes: 0

Kickstart

Reputation: 21513

Avoiding the need for a sub query

SELECT t1.*
FROM t t1
LEFT OUTER JOIN t t2
ON t1.c = t2.c
AND t1.d = t2.d
AND t1.id < t2.id
AND t2.id IS NULL
AND t2.a = 'some' 
AND t2.b = 0

Upvotes: 0

DRapp

Reputation: 48139

All different ways to "skin-a-cat", but here's slightly different... Since you are looking for IN, I would move that query into the front position. Also, it MAY help using MySQL's language specific keyword "STRAIGHT_JOIN" telling MySQL to do in the order you have listed. Again it MAY help

SELECT 
      T.* 
   FROM 
      (SELECT max(id) AS id 
          FROM t 
          WHERE b = 0 
             AND a = 'some' 
          GROUP BY c, d) PQ
      JOIN T
         on PQ.ID = T.ID

I would also have index specifically in order of

(b, a, c, d, id )

Obviously keep the primary ID key, and if using STRAIGHT_JOIN, would be

SELECT STRAIGHT_JOIN 
      T.* ( ... rest of query)

Upvotes: 1

Zaynul Abadin Tuhin

Reputation: 32003

you can try by using corelated subquery and creating index in column c and d

SELECT t1.* FROM table_name t1 
WHERE id = (SELECT max(id) AS id FROM table_name t2 where
             t1.c=t2.c and t1.d=t2.d
            ) and t1.a = 'some' AND t1.b = 0

Upvotes: 0

how to optimize sql when find the max record of each group while table is large?

Answers (4)

Related Questions