Reputation: 909
I have 2 MySQL tables with several millions of rows and I'm trying to perform a query selection for get specific data column from both tables. Despite my first good expectations the execution of the query selection takes several seconds (around 5 seconds) also with an index applied on the WHERE condition.
CREATE TABLE `T1` (
`T1_id` int(15) NOT NULL AUTO_INCREMENT,
`T1_val1` varchar(45) NOT NULL,
`T1_val2` varchar(45) NOT NULL,
`T1_val3` bigint(11) NOT NULL,
`T1_val4` datetime NOT NULL,
`T1_val5` varchar(100) NOT NULL,
`T1_val6` float NOT NULL,
`T1_val7` datetime NOT NULL,
`T1_val8` varchar(100) NOT NULL,
`T1_val9` varchar(100) NOT NULL,
`T1_val10` varchar(100) NOT NULL,
PRIMARY KEY (`T1_id`),
KEY `T1_val4` (`T1_val4`)
) ENGINE=InnoDB AUTO_INCREMENT=53885653 DEFAULT CHARSET=latin1;
CREATE TABLE `T2` (
`T2_id` int(11) NOT NULL,
`T2_val1` float NOT NULL,
`T2_val2` float NOT NULL,
`T2_val3` varchar(45) NOT NULL,
PRIMARY KEY (`T2_id`),
KEY `T2_val3` (`T2_val3`),
KEY `T2_val1_2` (`T2_val1`,`T2_val2`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
As you can see both tables have an AI primary key and foreign key that matches as one-to-one relationship
between them (T1_id
and T2_id
). And we have an index applied on T1
for T1_val4
that is in a datetime format.
SELECT
T1_val5,
T2_val1,
T2_val2,
T2_val3,
T1_val9,
count(T2_val1) as cnt,
T1_val4
FROM
T1 USE INDEX (T1_val4)
INNER JOIN T2
ON T1.T1_id = T2.T2_id
WHERE
T1_val4 BETWEEN '2016-02-18 15:00:00'
AND '2016-02-18 16:59:59'
GROUP BY
T2_val1,
T2_val2,
T2_val3,
T1_val9,
T1_val5
order by
T1_val4 ASC;
As you can notice, I have specified a HINT for the index in order to tell to MySQL to use that specific index for the datetime column. Actually if I extend the datetime range in the WHERE condition to several hours well.. for example BETWEEN '2016-02-18 15:00:00' AND '2016-02-18 23:59:59'
the execution time grows up till 50/100 seconds. Probably I'm missing something in the logic.
+-------+---------------+-----------+-----------+-------------------+-----------+---------------+-----------+-----------+------------------------------------------------------------+
| ID | SELECT_TYPE | TABLE | TYPE | POSSIBLE_KEYS | KEY | KEY_LEN | REF | ROWS | EXTRA |
+-------+---------------+-----------+-----------+-------------------+-----------+---------------+-----------+-----------+------------------------------------------------------------+
| 1 | SIMPLE | T1 | range | T1_val4 | T1_val4 | 5 | NULL | 10670 | "Using index condition; Using temporary; Using filesort" |
+-------+---------------+-----------+-----------+-------------------+-----------+---------------+-----------+-----------+------------------------------------------------------------+
| 1 | SIMPLE | T2 | eq_ref | PRIMARY | PRIMARY | 4 | T1_id | 1 | NULL |
+-------+---------------+-----------+-----------+-------------------+-----------+---------------+-----------+-----------+------------------------------------------------------------+
(As suggested from @O. Jones)
+-------+---------------+-----------+-----------+------------------------------+-----------+---------------+-----------+-----------+---------------------------------------------------------------+
| ID | SELECT_TYPE | TABLE | TYPE | POSSIBLE_KEYS | KEY | KEY_LEN | REF | ROWS | EXTRA |
+-------+---------------+-----------+-----------+------------------------------+-----------+---------------+-----------+-----------+---------------------------------------------------------------+
| 1 | SIMPLE | T1 | range | "PRIMARY,ix_rlf" | ix_rlf | 5 | NULL | 10906 | "Using where; Using index; Using temporary; Using filesort" |
+-------+---------------+-----------+-----------+------------------------------+-----------+---------------+-----------+-----------+---------------------------------------------------------------+
| 1 | SIMPLE | T2 | eq_ref | "PRIMARY,ix_cc" | PRIMARY | 4 | T1_id | 1 | NULL |
+-------+---------------+-----------+-----------+------------------------------+-----------+---------------+-----------+-----------+---------------------------------------------------------------+
ix_rlf is the compound index of T1_val4
, T1_val9
, T1_val5
and ix_cc is the compound index suggested from @Tom Shir for T2 made of T2_id
, T2_val1
, T2_val2
, T2_val3
.
(considering 2 HOURS as interval, and in this case the result of the query it's around 6632 rows and 6/7 seconds as execution time)
Upvotes: 2
Views: 205
Reputation: 482
This is your query with table prefixes:
SELECT
T1.T1_val5,
T2.T2_val1,
T2.T2_val2,
T2.T2_val3,
T1.T1_val9,
COUNT(T2.T2_val1) AS cnt,
T1.T1_val4
FROM
T1
INNER JOIN
T2.T2
ON T1.T1_id = T2.T2_id
WHERE
T1.T1_val4 BETWEEN '2016-02-18 15:00:00' AND '2016-02-18 16:59:59'
GROUP BY
T2.T2_val1,
T2.T2_val2,
T2.T2_val3,
T1.T1_val9,
T1.T1_val5
ORDER BY
T1.T1_val4 ASC
I believe you can improve its performance by using the correct indexes. I ran your query through an SQL query optimizer I'm using for my own queries, which recommended to use these indexes:
ALTER TABLE `T1` ADD INDEX `T1_index_1` (`T1_id`, `T1_val4`);
ALTER TABLE `T2` ADD INDEX `T2_index_1` (`T2_id`, `T2_val1`, `T2_val2`, `T2_val3`);
Also, please post the explain plan, as it can help better understand which indexes are currently used by MySQL.
Another recommendation - remove the hint you added. Usually, MySQL will know how to optimize the query better than we do.
Upvotes: 1
Reputation: 108641
By leaving out T1_val4
from your GROUP BY
clause you're taking advantage of MySQL's nonstandard extension. You may be getting undesired results. Please read this. https://dev.mysql.com/doc/refman/5.7/en/group-by-handling.html
Generally, using BETWEEN
for datetime-like columns is a bad idea because it handles range-end conditions poorly. If I were you I'd write this
WHERE T1_val4 >= '2016-02-18 15:00:00' AND T1_val4 < '2016-02-18 17:00:00'
You have the right idea indexing your datestamp column. You might try using a compound covering index instead of your simple datestamp index. It looks like your query is pulling about ten thousand rows from T1
, so a compound covering index will help. Put all the columns you need in the index, with the range scan column first. That means MySQL can satisfy the whole query by doing the index range scan, which is faster. The index should be on these columns.
T1_val4, T1_val9, T1_val5
Because you're using InnoDB, you don't have to include the primary key in your compound index.
This should be a little faster. But, still, you're asking MySQL to retrieve and index ten thousand rows, and that is actually real work.
Upvotes: 1