Reputation: 45715
I want to query a table T
for rows identified by some column x
. The values of x
, for which the rows should be returned, are determined by some sub-query on table U
, which carries the actual filter condition on columns a
(a short string) and b
(a timestamp value, looking for a range).
SELECT * FROM T
WHERE x IN (
SELECT x FROM U WHERE a = ? AND b BETWEEN ? AND ?
)
ORDER BY x, y
Both tables T
and U
are quite big (in the range of 1M..100M rows). Their definitions are at the end of the question. In the typical use case, about 500 to 1000 rows are in the result set for about 100 different values for x
. These values for x
are not consecutive, and indeed very "random".
This query takes quite some long time (in the range of 0.5 to 2 seconds), although proper indices for both tables T
and U
are used and indeed only a small number of rows is being "examined" (about 1000) according to the slow query log.
The indices are defined as:
idx_u
on table U
covers the columns in the WHERE
clause (a
, b
).idx_t
on table T
covers the columns x
(to be used to find rows) and y
(to be used for the secondary order criterion).The EXPLAIN
for the query above is:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE U range idx_u idx_u 24 NULL 107 Using where; Using index; Using temporary; Using filesort
1 SIMPLE T ref idx_t idx_t 8 U.x 4 NULL
I also tried a JOIN
instead of sub-query:
SELECT * FROM T
JOIN U ON T.x = U.x
WHERE a = ? AND b BETWEEN ? AND ?
ORDER BY T.x, T.y
The EXPLAIN
tells exactly the same as above.
However, when executing the sub-query separately, and listing the values for x
literally, both queries together take much less time (in the range of 10 to 20 milliseconds). Also, their EXPLAIN
looks different.
The first query (originally the sub-query):
SELECT x FROM U WHERE a = ? AND b BETWEEN ? AND ?
Its EXPLAIN
matches the first entry from above:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE U range idx_u idx_u 24 NULL 107 Using where; Using index; Using temporary; Using filesort
Now the second query:
SELECT * FROM T
WHERE x IN (
3917,8525,13149,17729,22355,26908,31457,36053,40663,45250 -- ... (64 values here)
)
ORDER BY x, y
is explained as
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE T range idx_t idx_t 8 U.x 436 Using index condition
Now I noticed that in the original (composed) query, for table T
, the guess for the number of rows is only 4, which is way too low. Also, the type for T
is ref
, while in the separate query (with explicit values for x
), it is range
.
Might the misguessed row number for the lookup in T
cause the query to be executed in a sub-optimal way? Is the ref
type the reason for the big performance difference?
And finally, to fix this, can I force range
? Note that this is not about forcing an index to be used, as it already seems to be used.
EDIT: The tables have been created like this:
CREATE TABLE `U` (
`x` bigint(20) UNSIGNED NOT NULL AUTO_INCREMENT,
`a` varchar(20) CHARACTER SET ascii NOT NULL,
`b` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
-- (more columns irrelevant here)
) ENGINE=InnoDB;
ALTER TABLE `U`
ADD PRIMARY KEY (`x`),
ADD KEY `idx_u` (`a`,`b`) USING BTREE;
CREATE TABLE `T` (
`id` bigint(20) UNSIGNED NOT NULL AUTO_INCREMENT,
`x` bigint(20) UNSIGNED NOT NULL,
`y` int(11) UNSIGNED NOT NULL,
-- (more columns irrelevant here)
) ENGINE=InnoDB;
ALTER TABLE `T`
ADD PRIMARY KEY (`id`),
ADD KEY `idx_t` (`x`,`y`) USING BTREE;
Upvotes: 3
Views: 108
Reputation: 77906
What if you change your query to use WHERE EXISTS
instead of IN
operator and since you already tried with JOIN
conversion
SELECT * FROM T
WHERE EXISTS (SELECT 1 FROM U WHERE T.x = U.x
AND a = ?
AND b BETWEEN ? AND ?
)
ORDER BY x, y;
Strange fact is that, I see you have an index on x,y
(ADD KEY idx_t (x,y) USING BTREE
) but your explain shows it's performing a filesort Using where; Using index; Using filesort
Upvotes: 1
Reputation: 490
You can try with join like
SELECT T.* FROM T
JOIN U on T.x = U.x
WHERE U.a = ? AND U.b BETWEEN ? AND ?
ORDER BY T.x, T.y
Upvotes: 0