cdm
cdm

Reputation: 799

MySQL Query Performance - Subquery Vs Join

I'm not a SQL expert, so am trying to understand why two queries have substantially different execution times.

common_stats is a large table (3 million rows). I'm just trying to understand why the huge difference in performance.

The below query takes ~15 seconds:

select distinct cs.basesalary_id
from common_stats AS cs
LEFT JOIN basesalary AS b ON b.id = cs.basesalary_id
WHERE (b.employee_id= 100 AND cs.amount > 0 AND cs.pay_id is null );

This query takes ~.1 seconds:

select distinct basesalary_id from (
    select cs.basesalary_id, cs.pay_id
    from common_stats AS cs
    LEFT JOIN basesalary AS b ON b.id = cs.basesalary_id
    WHERE (b.employee_id= 100 AND cs.amount > 0)
) as temp
where pay_id is null;

Upvotes: 0

Views: 1152

Answers (1)

Rick James
Rick James

Reputation: 142298

As a general rule:

  • If the subquery 'shrinks' the number of rows (say, via GROUP BY or LIMIT), the subquery approach is better.
  • JOIN often creates more rows than the original tables. DISTINCT is sometime used to deflate that row count.
  • If the outer query has a GROUP BY, JOIN may create more rows than you realize and "inflate" any aggregates (SUM, COUNT, etc), thereby voting for subquery instead.
  • More than one subquery would lead to poor optimization. (Thereby voting for JOIN.)

All of those assume optimal indexes.

Your first query would probably benefit from

INDEX(pay_id, amount, basesalary_id) -- both "covering" and optimal for `WHERE`

Check out EXPLAIN SELECT ... for both queries. Probably the faster one starts with basesalary and there is INDEX(employee_id) and that is very selective.

I may have more comments after seeing SHOW CREATE TABLE.

Upvotes: 2

Related Questions