Steven Chou
Steven Chou

Reputation: 2215

Mariadb(MySQL):Query slow when using sub query

original sql:

select * from A
join B on A.aid = b.aid
...(some join)
limit 0,1000

it will return results in 1s, but when using subquery, it was became really slow(in minutes):

select * from
(
select * from A
join B on A.aid = b.aid
...(some join)
) tmp
limit 0,1000

when I execute with explain, it has a additional row:

select_type: PRIMARY
table: <derived2>
type: ALL
possible_keys: null
key_len: null
rows: 1504940
Extra: Using temporary; Using filesort

MySQL version:5.6

ps. I need to use the query in subquery way, because it is happens in a view read by another application.

Upvotes: 2

Views: 2486

Answers (2)

Rick James
Rick James

Reputation: 142298

IN ( SELECT ... ) is notorious for being poorly optimized. In recent releases it is getting better.

FROM ( SELECT ... ) 
JOIN ( SELECT ... ) ON ...

used to be terrible for performance because it had no indexes for the ON. In 5.6 the Optimizer scratches its head and decides what index might be good, then creates such (cf "auto-key" in EXPLAIN) to help.

Wrapping a query with SELECT ( ... ) LIMIT ... (as in your example) requires the inner query to finish, and build a potentially large (1504940-row?) temp table. All of this before peeling off 1000 rows. Sure the Optimizer might be able to recognize that pattern, but I would argue that "since the outer SELECT adds nothing useful, why do it?". So why should the Optimizer folks waste their time time optimizing such.

Another situation that bugs many people, and they ask "Why does it run so much faster when I remove the ORDER BY?":

SELECT ...
    ORDER BY ...
    LIMIT ...

To add to the puzzlement, some variants do not run any faster. The key factor here is whether there is a composite index that handles all of the WHERE and the ORDER BY.

Upvotes: 0

Yuseferi
Yuseferi

Reputation: 8670

The first approach, with joins, is by far faster. In second the query will be executed for each row. Some databases optimize nested queries into joins though.

Join vs. sub-query

Article MySQL performance: INNER JOIN vs. sub-select

I found using a 'virtual table' instead of a ROW subquery is much faster on my table. It seems that the row subquery isn't optimized, where the join over the 'virtual table' is optimized.

Below are the queries and the 'EXPLAIN' returned for educational purposes.

-- Query using ROW subquery

EXPLAIN
SELECT 
* 
FROM 
region
WHERE 
ROW (PDB,CHAIN) IN (
SELECT 
region.PDB, 
region.CHAIN
FROM 
region LEFT JOIN split_domain USING (SUNID)
WHERE
split_domain.SUNID IS NULL
GROUP BY
PDB, CHAIN
HAVING
COUNT(*)>1
)
LIMIT 
10
;

+----+--------------------+--------------+--------+---------------+---------+---------+------------------------+-------+--------------------------------------+
| id | select_type        | table        | type   | possible_keys | key     | key_len | ref                    | rows  | Extra                                |
+----+--------------------+--------------+--------+---------------+---------+---------+------------------------+-------+--------------------------------------+
|  1 | PRIMARY            | region       | ALL    | NULL          | NULL    |    NULL | NULL                   | 57362 | Using where                          |
|  2 | DEPENDENT SUBQUERY | region       | ALL    | NULL          | NULL    |    NULL | NULL                   | 57362 | Using temporary; Using filesort      |
|  2 | DEPENDENT SUBQUERY | split_domain | eq_ref | PRIMARY       | PRIMARY |       3 | scop_1_65.region.SUNID |     1 | Using where; Using index; Not exists |
+----+--------------------+--------------+--------+---------------+---------+---------+------------------------+-------+--------------------------------------+
3 rows in set (0.04 sec)

I can't get any results out of the above (takes too long) - Perhaps the limit clause isn't kicking in?

-- Query using joined virtual table

EXPLAIN
SELECT 
* 
FROM 
region
INNER JOIN (
SELECT 
region.PDB, 
region.CHAIN
FROM 
region LEFT JOIN split_domain USING (SUNID)
WHERE
split_domain.SUNID IS NULL
GROUP BY
PDB, CHAIN
HAVING
COUNT(*)>1
) AS x
ON
region.PDB = x.PDB 
AND
region.CHAIN = x.CHAIN
LIMIT 
10
;

+----+-------------+--------------+--------+---------------------+-----------+---------+------------------------+-------+--------------------------------------+
| id | select_type | table        | type   | possible_keys       | key       | key_len | ref                    | rows  | Extra                                |
+----+-------------+--------------+--------+---------------------+-----------+---------+------------------------+-------+--------------------------------------+
|  1 | PRIMARY     | <derived2>   | ALL    | NULL                | NULL      |    NULL | NULL                   |  8624 |                                      |
|  1 | PRIMARY     | region       | ref    | PDB,CHAIN,pdb_chain | pdb_chain |       5 | x.PDB,x.CHAIN          |     1 |                                      |
|  2 | DERIVED     | region       | ALL    | NULL                | NULL      |    NULL | NULL                   | 57362 | Using temporary; Using filesort      |
|  2 | DERIVED     | split_domain | eq_ref | PRIMARY             | PRIMARY   |       3 | scop_1_65.region.SUNID |     1 | Using where; Using index; Not exists |
+----+-------------+--------------+--------+---------------------+-----------+---------+------------------------+-------+--------------------------------------+
4 rows in set (1.02 sec)

The above returns ... 10 results in about 1 second 100 results in about 1 second 1000 results in about 1.5 seconds Full set (20437) in about 2 seconds

The former query doesn't return in 5 mins (even with limit 10).

I hope this is useful to anyone designing (or trying to optimize) complex subqueries, and that the precise details of the data are not necessary to convey the results presented here.

Upvotes: 1

Related Questions