UgoL
UgoL

Reputation: 909

Improve speed execution of MySQL select query

I have 2 MySQL tables with several millions of rows and I'm trying to perform a query selection for get specific data column from both tables. Despite my first good expectations the execution of the query selection takes several seconds (around 5 seconds) also with an index applied on the WHERE condition.

Create statement for the 2 MySQL tables:

CREATE TABLE `T1` (
  `T1_id` int(15) NOT NULL AUTO_INCREMENT, 
  `T1_val1` varchar(45) NOT NULL,
  `T1_val2` varchar(45) NOT NULL,
  `T1_val3` bigint(11) NOT NULL,
  `T1_val4` datetime NOT NULL, 
  `T1_val5` varchar(100) NOT NULL, 
  `T1_val6` float NOT NULL,
  `T1_val7` datetime NOT NULL,
  `T1_val8` varchar(100) NOT NULL,
  `T1_val9` varchar(100) NOT NULL, 
  `T1_val10` varchar(100) NOT NULL,
  PRIMARY KEY (`T1_id`),
  KEY `T1_val4` (`T1_val4`)
) ENGINE=InnoDB AUTO_INCREMENT=53885653 DEFAULT CHARSET=latin1;

CREATE TABLE `T2` (
  `T2_id` int(11) NOT NULL,
  `T2_val1` float NOT NULL,
  `T2_val2` float NOT NULL,
  `T2_val3` varchar(45) NOT NULL,
  PRIMARY KEY (`T2_id`),
  KEY `T2_val3` (`T2_val3`),
  KEY `T2_val1_2` (`T2_val1`,`T2_val2`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

As you can see both tables have an AI primary key and foreign key that matches as one-to-one relationship between them (T1_id and T2_id). And we have an index applied on T1 for T1_val4 that is in a datetime format.

The selection query

SELECT 
      T1_val5, 
      T2_val1, 
      T2_val2, 
      T2_val3, 
      T1_val9, 
      count(T2_val1) as cnt, 
      T1_val4
   FROM 
      T1 USE INDEX (T1_val4)
         INNER JOIN T2 
            ON T1.T1_id = T2.T2_id 
   WHERE 
      T1_val4 BETWEEN '2016-02-18 15:00:00' 
                  AND '2016-02-18 16:59:59'
   GROUP BY 
      T2_val1, 
      T2_val2, 
      T2_val3, 
      T1_val9, 
      T1_val5
   order by 
      T1_val4 ASC;

As you can notice, I have specified a HINT for the index in order to tell to MySQL to use that specific index for the datetime column. Actually if I extend the datetime range in the WHERE condition to several hours well.. for example BETWEEN '2016-02-18 15:00:00' AND '2016-02-18 23:59:59' the execution time grows up till 50/100 seconds. Probably I'm missing something in the logic.

The EXPLAIN output

+-------+---------------+-----------+-----------+-------------------+-----------+---------------+-----------+-----------+------------------------------------------------------------+
|   ID  |   SELECT_TYPE |   TABLE   |   TYPE    |   POSSIBLE_KEYS   |   KEY     |   KEY_LEN     |   REF     |   ROWS    |       EXTRA                                                |
+-------+---------------+-----------+-----------+-------------------+-----------+---------------+-----------+-----------+------------------------------------------------------------+
|   1   |   SIMPLE      |   T1      |   range   |   T1_val4         |   T1_val4 |      5        |   NULL    |   10670   |   "Using index condition; Using temporary; Using filesort" |
+-------+---------------+-----------+-----------+-------------------+-----------+---------------+-----------+-----------+------------------------------------------------------------+
|   1   |   SIMPLE      |   T2      |   eq_ref  |   PRIMARY         |   PRIMARY |      4        |   T1_id   |   1       |     NULL                                                   |
+-------+---------------+-----------+-----------+-------------------+-----------+---------------+-----------+-----------+------------------------------------------------------------+

The EXPLAIN output updated with compound index

(As suggested from @O. Jones)

+-------+---------------+-----------+-----------+------------------------------+-----------+---------------+-----------+-----------+---------------------------------------------------------------+
|   ID  |   SELECT_TYPE |   TABLE   |   TYPE    |   POSSIBLE_KEYS              |   KEY     |   KEY_LEN     |   REF     |   ROWS    |       EXTRA                                                   |
+-------+---------------+-----------+-----------+------------------------------+-----------+---------------+-----------+-----------+---------------------------------------------------------------+
|   1   |   SIMPLE      |   T1      |   range   |   "PRIMARY,ix_rlf"           |   ix_rlf  |      5        |   NULL    |   10906   |   "Using where; Using index; Using temporary; Using filesort" |
+-------+---------------+-----------+-----------+------------------------------+-----------+---------------+-----------+-----------+---------------------------------------------------------------+
|   1   |   SIMPLE      |   T2      |   eq_ref  |   "PRIMARY,ix_cc"            |   PRIMARY |      4        |   T1_id   |   1       |     NULL                                                      |
+-------+---------------+-----------+-----------+------------------------------+-----------+---------------+-----------+-----------+---------------------------------------------------------------+

ix_rlf is the compound index of T1_val4, T1_val9, T1_val5and ix_cc is the compound index suggested from @Tom Shir for T2 made of T2_id, T2_val1, T2_val2, T2_val3.

Query execution plan visual schema

(considering 2 HOURS as interval, and in this case the result of the query it's around 6632 rows and 6/7 seconds as execution time)

query execution plan schema

Upvotes: 2

Views: 205

Answers (2)

Tom Shir
Tom Shir

Reputation: 482

This is your query with table prefixes:

SELECT
        T1.T1_val5,
        T2.T2_val1,
        T2.T2_val2,
        T2.T2_val3,
        T1.T1_val9,
        COUNT(T2.T2_val1) AS cnt,
        T1.T1_val4 
    FROM
        T1 
    INNER JOIN
        T2.T2 
            ON T1.T1_id = T2.T2_id 
    WHERE
        T1.T1_val4 BETWEEN '2016-02-18 15:00:00' AND '2016-02-18 16:59:59' 
    GROUP BY
        T2.T2_val1,
        T2.T2_val2,
        T2.T2_val3,
        T1.T1_val9,
        T1.T1_val5 
    ORDER BY
        T1.T1_val4 ASC

I believe you can improve its performance by using the correct indexes. I ran your query through an SQL query optimizer I'm using for my own queries, which recommended to use these indexes:

ALTER TABLE `T1` ADD INDEX `T1_index_1` (`T1_id`, `T1_val4`);
ALTER TABLE `T2` ADD INDEX `T2_index_1` (`T2_id`, `T2_val1`, `T2_val2`, `T2_val3`);

Also, please post the explain plan, as it can help better understand which indexes are currently used by MySQL.

Another recommendation - remove the hint you added. Usually, MySQL will know how to optimize the query better than we do.

Upvotes: 1

O. Jones
O. Jones

Reputation: 108641

By leaving out T1_val4 from your GROUP BY clause you're taking advantage of MySQL's nonstandard extension. You may be getting undesired results. Please read this. https://dev.mysql.com/doc/refman/5.7/en/group-by-handling.html

Generally, using BETWEEN for datetime-like columns is a bad idea because it handles range-end conditions poorly. If I were you I'd write this

WHERE T1_val4 >= '2016-02-18 15:00:00' AND T1_val4 < '2016-02-18 17:00:00'

You have the right idea indexing your datestamp column. You might try using a compound covering index instead of your simple datestamp index. It looks like your query is pulling about ten thousand rows from T1, so a compound covering index will help. Put all the columns you need in the index, with the range scan column first. That means MySQL can satisfy the whole query by doing the index range scan, which is faster. The index should be on these columns.

 T1_val4, T1_val9, T1_val5

Because you're using InnoDB, you don't have to include the primary key in your compound index.

This should be a little faster. But, still, you're asking MySQL to retrieve and index ten thousand rows, and that is actually real work.

Upvotes: 1

Related Questions