Reputation: 8560
I have two mysql scripts:
select
p.*
from
`products` p
JOIN dispensaries d ON (p.dispensary_id = d.id)
JOIN dispensary_locations dl ON(dl.dispensary_id = d.id AND dl.is_primary = 1)
JOIN dispensary_location_zip_codes dlzc ON(dlzc.dispensary_location_id = dl.id AND dlzc.zip_code = '941033')
and p.`is_hidden` = 0
and p.`deleted_at` is null
GROUP BY p.id
order by p.`points` desc
limit 12
and
select
*
from
`products`
where
exists (
select
*
from
`dispensaries`
where
`products`.`dispensary_id` = `dispensaries`.`id`
and exists (
select
*
from
`dispensary_locations`
where
`dispensary_locations`.`dispensary_id` = `dispensaries`.`id`
and exists (
select *
from
`dispensary_location_zip_codes`
where
`dispensary_location_zip_codes`.`dispensary_location_id` = `dispensary_locations`.`id` and `zip_code` = '941033'
)
AND is_primary = 1
)
)
order by `points` desc
limit 10;
Logical they should be the same but on my database first one takes 60 ms when zip_code exists, 30 ms when it does not exist. Second one takes 5 ms when zip_code exists, 9500 ms when zip_code does not exist, does any know what is happening here ?
I have about 10 000 products in database. Times are 60 ms and 30 ms for first script, 5 ms and 9500 ms (9,5 seconds) for second script...
Upvotes: 0
Views: 1180
Reputation: 418
Why is Mysql “where exists” much slower than “join”?
It's not slower in all the cases. It depends on many factors like size of each table, index on joined columns, existence of value(especially for exists statement) etc.
For example, Let p
and q
be the number of entries in each of the tables.
By default Exists
does a nested loop and It stops execution as soon as it finds something. So, complexity for worst case scenario (If value doesn't exists) is O(p*q)
.
Rough sketch of how DB Engine joins tables
Nested join
O(p*q)
O(p*logq)
Hash join
Prepares a hash table of the smaller relation with join attributes and rows. Scan the larger relation.
If a tables are small enough to fit in memory
Complexity, As expected - O(p+q)
Merge Join
If both the tables are in same sort order. Both are run through in order and matched up where they correspond.
If both the tables have an index on joined column, index already maintains them in order. So, complexity - O(p+q)
O(p+qlogq)
O(plogq+qlogq)
In above scenario, when zip_codes doesn't exists, database engine had to do a nested loop over all the entries (O(p*d*dl*dlz)
) in query 2 (exists) where as in query 1 database engine applied a optimized join to get the results.
When zip_codes exists, In query 2 (exists) scenario, It found matching entries without doing a nested loop over all the entries.
Upvotes: 5