fico7489
fico7489

Reputation: 8560

Why is Mysql "where exists" much slower than "join"?

I have two mysql scripts:

select
  p.*
from
    `products` p
    JOIN dispensaries d ON (p.dispensary_id = d.id)
    JOIN dispensary_locations dl ON(dl.dispensary_id = d.id AND dl.is_primary = 1)
    JOIN dispensary_location_zip_codes dlzc ON(dlzc.dispensary_location_id = dl.id AND dlzc.zip_code = '941033')
    and p.`is_hidden` = 0
    and p.`deleted_at` is null
GROUP BY p.id
order by p.`points` desc
limit 12

and

select
  *
from
  `products`
where
  exists (
    select
      *
    from
      `dispensaries`
    where
      `products`.`dispensary_id` = `dispensaries`.`id`
      and exists (
        select
          *
        from
          `dispensary_locations`
        where
          `dispensary_locations`.`dispensary_id` = `dispensaries`.`id`
          and exists (
            select *
            from
              `dispensary_location_zip_codes`
            where
              `dispensary_location_zip_codes`.`dispensary_location_id` = `dispensary_locations`.`id` and `zip_code` = '941033'
          )
          AND is_primary = 1
      )
  )
order by `points` desc
limit 10;

Logical they should be the same but on my database first one takes 60 ms when zip_code exists, 30 ms when it does not exist. Second one takes 5 ms when zip_code exists, 9500 ms when zip_code does not exist, does any know what is happening here ?

I have about 10 000 products in database. Times are 60 ms and 30 ms for first script, 5 ms and 9500 ms (9,5 seconds) for second script...

Upvotes: 0

Views: 1180

Answers (1)

Prasanth
Prasanth

Reputation: 418

Why is Mysql “where exists” much slower than “join”?

It's not slower in all the cases. It depends on many factors like size of each table, index on joined columns, existence of value(especially for exists statement) etc.

For example, Let p and q be the number of entries in each of the tables.

By default Exists does a nested loop and It stops execution as soon as it finds something. So, complexity for worst case scenario (If value doesn't exists) is O(p*q).

Rough sketch of how DB Engine joins tables

Nested join

  • Compares every record in the table with every record in the other
  • This join is efficient if one of the tables is extremely small
  • Complexity if join columns are not indexed - O(p*q)
  • Complexity if join columns are indexed - O(p*logq)

Hash join

  • Prepares a hash table of the smaller relation with join attributes and rows. Scan the larger relation.

  • If a tables are small enough to fit in memory

  • Complexity, As expected - O(p+q)

Merge Join

  • If both the tables are in same sort order. Both are run through in order and matched up where they correspond.

  • If both the tables have an index on joined column, index already maintains them in order. So, complexity - O(p+q)

  • If one table have an index on joined column, only one table has to be sorted before the merge step can happen . So, complexity - O(p+qlogq)
  • If both the tables don't have index on joined column, both tables has to be sorted before the merge step can happen . So, complexity - O(plogq+qlogq)

In above scenario, when zip_codes doesn't exists, database engine had to do a nested loop over all the entries (O(p*d*dl*dlz)) in query 2 (exists) where as in query 1 database engine applied a optimized join to get the results.

When zip_codes exists, In query 2 (exists) scenario, It found matching entries without doing a nested loop over all the entries.

Upvotes: 5

Related Questions