Reputation: 799
I'm not a SQL expert, so am trying to understand why two queries have substantially different execution times.
common_stats is a large table (3 million rows). I'm just trying to understand why the huge difference in performance.
The below query takes ~15 seconds:
select distinct cs.basesalary_id
from common_stats AS cs
LEFT JOIN basesalary AS b ON b.id = cs.basesalary_id
WHERE (b.employee_id= 100 AND cs.amount > 0 AND cs.pay_id is null );
This query takes ~.1 seconds:
select distinct basesalary_id from (
select cs.basesalary_id, cs.pay_id
from common_stats AS cs
LEFT JOIN basesalary AS b ON b.id = cs.basesalary_id
WHERE (b.employee_id= 100 AND cs.amount > 0)
) as temp
where pay_id is null;
Upvotes: 0
Views: 1152
Reputation: 142298
As a general rule:
GROUP BY
or LIMIT
), the subquery approach is better.JOIN
often creates more rows than the original tables. DISTINCT
is sometime used to deflate that row count.GROUP BY
, JOIN
may create more rows than you realize and "inflate" any aggregates (SUM
, COUNT
, etc), thereby voting for subquery instead.JOIN
.)All of those assume optimal indexes.
Your first query would probably benefit from
INDEX(pay_id, amount, basesalary_id) -- both "covering" and optimal for `WHERE`
Check out EXPLAIN SELECT ...
for both queries. Probably the faster one starts with basesalary
and there is INDEX(employee_id)
and that is very selective.
I may have more comments after seeing SHOW CREATE TABLE
.
Upvotes: 2