Reputation: 812
This question is really just out of curiosity. Can someone shed some light onto why this query:
insert into c (
select a.* from a,b
where a.x=8 and y=1 and b.z=a.z and active > '2010-01-07 00:00:00'
group by a.z
)
is a million times faster (or close enough) than the query below whenever there's quite a few records involved?
insert into c (
select * from a
where x=8 and y=1 and z in (
select z from b where x=8 and active > '2010-01-07 00:00:00' group by z
)
)
What I mean is that a query joining a table is so much faster than using IN, but why is this?
Upvotes: 2
Views: 5448
Reputation: 8232
In response to Tim's comment, no joins are very different.
Let us say that table a has 10,000 rows, and table b has 1,000 rows. The subquery approach runs the inner query for every row of the outer query. That results in 10,000 x 1,000 = 10,000,000 rows to process.
Depending on the fields involved in the join (and whether they are indexed), the join may only have to pass over each table once, resulting in ~11,000 rows processed. It may even only have to read each row of the smaller table and discard the remainder of the larger table resulting in even fewer reads.
It all depends on the join algorithm and the database engine.
Upvotes: 1
Reputation: 499352
Becuase the subquery is being performed once for every row in the enclosing query, and the join is only performed once.
Databases are optimised for set based queries, so most of the time joins will be faster than subqueries.
You are certainly not the only one to experience this (here is one example). Looks like the query optimizer for MySql doesn't optimise subqueries of this type.
Upvotes: 1