Reputation: 1110
Using MySql 5.6, I'm noticing that combined Select
queries (e.g. select x.a from X x where x.b in (select y.b from Y y where y.c = 'something')
) are way slower than doing two separate queries, using the results of the first query in the in
clause of the second query. And my attempts at using Join
statements instead of nested queries (influenced by other posts on this site) don't produce any performance improvements.
I know this is a common issue with MySql and I've read many postings here on SO about this issue and tried some of the solutions which, apparently, worked for other posters, but not for me.
This query:
select ADSH_ from SECSub where Symbol_='MSFT';
is fast and produces this result:
'0001193125-10-015598'
'0001193125-10-090116'
'0001193125-10-171791'
There are actually 21 results, but I've trimmed them for this posting to 3.
Here's some additional info:
show indexes from SECSub;
produces:
And
explain select * from SECSub where Symbol_='MSFT';
produces:
Querying a different table using the results of the first query, like this:
select * from SECNum where ADSH_ in (
'0001193125-10-015598',
'0001193125-10-090116',
'0001193125-10-171791);
Is similarly fast (.094 seconds). The actual query's in
clause utilized the 21 results from the first query, but again I've trimmed them for this posting to 3.
And this:
show indexes from SECNum;
produces:
And
explain select * from SECNum where ADSH_ in (
'0001193125-10-015598',
'0001193125-10-090116',
'0001193125-10-171791');
produces:
But this combined query:
select *
from SECNum
where ADSH_ in (select ADSH_
from SECSub sub
where Symbol_='MSFT');
Is very slow, taking 151 seconds (compared to about 0.1 second for the previous query).
explain select * from SECNum where ADSH_ in (select ADSH_ from SECSub sub where Symbol_='MSFT');
produces:
So, after reading a few similar posts on SO I though I'd try to re-cast the combined query as a Join
operation:
select *
from SECNum num
inner join SECSub sub on num.ADSH_ = sub.ADSH_
where sub.Symbol_ = 'MSFT';
This result, which took 158 seconds, was even slower than using the combined query, which took 151 seconds.
explain select * from SECNum num inner join SECSub sub on num.ADSH_ = sub.ADSH_ where sub.Symbol_ = 'MSFT';
produced:
select *
from (select sub.ADSH_
from SECSub sub
where sub.Symbol_='MSFT') SubSelect
join SECNum num on SubSelect.ADSH_ = num.ADSH_;
This result clocked in at 151 seconds, the same as my combined query..
explain select * from (select sub.ADSH_ from SECSub sub where sub.Symbol_='MSFT') SubSelect join SECNum num on SubSelect.ADSH_ = num.ADSH_;
produced:
So obviously, I don't know what I'm doing (yet). Any suggestions on how to write a query that produces the same results as my combined query, or any of these Join queries, that runs as fast as the case where I have two separate queries (which was around 0.1 seconds)?
Upvotes: 0
Views: 372
Reputation: 142560
IN ( SELECT ... )
does not optimize well. In fact, until 5.6 it optimizes very poorly. 5.6 adds a technique that helps. But generally it is better to turn it into a JOIN, even with 5.6.
FROM ( SELECT ... ) a
JOIN ( SELECT ... ) b ON ...
Before 5.6, that performs very poorly because neither subquery has an index, hence lots of table scans of one of the tmp table. 5.6 (or is it 5.7?) 'discovers' the optimal index for subqueries, thereby helping significantly.
FROM tbl
JOIN ( SELECT ... ) x ON ...
will always (at least before 5.6) perform the subquery first, into a temporary table. Then it will do a NLJ (Nested Loop Join). So, it behooves you to have an index in tbl for whatever column(s) are in the ON
clause. And make it a compound index if there are multiple columns.
Compound queries are often better than single-column queries. Keep in mind that MySQL almost never uses two indexes in a single SELECT. ("Index merge")
Whenever asking a performance question, please provide SHOW CREATE TABLE
.
With these principles, you should be able to write better-performing queries without having to experiment so much.
Upvotes: 1
Reputation: 1110
First, I tried @Gordon Linoff's suggestion (or implied suggestion) to add a composite index on SECSub consisting of Symbol_ and ADSH_. That made no difference in the performance of any of the queries I tried.
While struggling with this performance issue I noticed that SECNum.ADSC_
was defined as character set latin1
while SECSub.ADSC_
was defined as character set utf8_general_ci
.
I then suspected that when I created the second query by copy and pasting the output from the first query:
select * from SECNum where ADSH_ in (
'0001193125-10-015598',
'0001193125-10-090116',
'0001193125-10-171791');
That the literal strings in the in
clause were using character set latin1
, since they were typed (well, copied and pasted) all from within the MySQL Workbench and that might explain why this query is so fast.
After doing this:
alter table SECSub convert to character set latin1;
The combined query (the subquery) was fast (under 1 second) and for the first time, the explain
showed that the query was using the index. The same was true for the variations using Join
.
I suppose if I had included in my original question the actual table definitions, someone would have pointed out to me that there was an inconsistency in character sets assigned to table columns that participate in indexes and queries. Lesson learned. Next time I post, I'll include the table definitions (at least for those columns participating in indexes and queries that I'm asking about).
Upvotes: 0
Reputation: 1271231
Let me start with this query:
select *
from SECNum
where ADSH_ in (select ADSH_
from SECSub sub
where Symbol_ = 'MSFT');
The optimal index on this would be the composite index SECSub(Symbol_, ADSH_)
. I am guess that because this index is not available, MySQL seems to be making the wrong choice. It is doing a full table scan and checking for the where
condition, rather than using the index to lookup the appropriate rows. A covering index (with the two columns) should put the MySQL optimizer on the right path.
Sometimes, in
with a subquery is not optimized so well (although I thought this was fixed in 5.6). Also try the query with not exists
:
select *
from SECNum sn
where not exists (select ADSH_
from SECSub sub
where sub.Symbol_ = 'MSFT' AND
sub.ADSH_ = sn.ADSH_
);
Upvotes: 1