Reputation: 2254
I will present a simplified version of my query:
SELECT item.prop1, item.prop2,
COALESCE(item.prop3, parents.prop3, grandparents.prop3) AS p3
FROM items
INNER JOIN parents ON item.parent_id = parents.id
INNER JOIN grandparents ON parents.grandparent_id = grandparents.id
INNER JOIN pivot ON pivot.item_id = items.id
INNER JOIN users ON pivot.user_id = users.id
WHERE
items.prop4 IS NULL
AND COALESCE(parents.prop5, grandparents.prop5) = 8
AND users.country_id IN (123)
This query is a bit slower than I can afford - it runs about 0.7 seconds.
While trying to optimize, I noticed that removing any of the last two lines make it run in 0.01-0.02 seconds.
I tried to split the filtering in two steps:
SELECT item.prop1, item.prop2, p3
FROM (
SELECT item.prop1, item.prop2,
COALESCE(item.prop3, parents.prop3, grandparents.prop3) AS p3,
users.country_id as country
FROM items
INNER JOIN parents ON item.parent_id = parents.id
INNER JOIN grandparents ON parents.grandparent_id = grandparents.id
INNER JOIN pivot ON pivot.item_id = items.id
INNER JOIN users ON pivot.user_id = users.id
WHERE
items.prop4 IS NULL
AND COALESCE(parents.prop5, grandparents.prop5) = 8
) AS temp
WHERE temp.country IN (123)
This made no difference in the total running time. When I tried to execute only the subquery, it completed in about 0.01 seconds and return about 1200 rows. I would expect that filtering by these 1200 rows down to 570 by a single column shouldn't cost me half a second, should it?
I also tried another hierarchy like this:
SELECT item.prop1, item.prop2,
COALESCE(item.prop3, parents.prop3, grandparents.prop3) AS p3
FROM items
INNER JOIN parents ON item.parent_id = parents.id
INNER JOIN grandparents ON parents.grandparent_id = grandparents.id
INNER JOIN (
SELECT pivot.item_id as item
FROM pivot
INNER JOIN users ON pivot.user_id = users.id
WHERE users.country_id IN (123)
) as country_items ON country_items.item = items.id
WHERE
items.prop4 IS NULL
AND COALESCE(parents.prop5, grandparents.prop5) = 8
Once again the subquery is fast (0.005 seconds, returns about 200000 rows) but whole thing together runs really slow.
What could cause such performance dip? I am almost leaning to involve my application to execute the last statement in two steps...
Selecting this constraint and employing HAVING
solves the issue - this query completes in 0.015 seconds.
SELECT item.prop1, item.prop2,
COALESCE(item.prop3, parents.prop3, grandparents.prop3) AS p3, users.country_id as country
FROM items
INNER JOIN parents ON item.parent_id = parents.id
INNER JOIN grandparents ON parents.grandparent_id = grandparents.id
INNER JOIN pivot ON pivot.item_id = items.id
INNER JOIN users ON pivot.user_id = users.id
WHERE
items.prop4 IS NULL
AND COALESCE(parents.prop5, grandparents.prop5) = 8
HAVING country IN (123)
I still do not understand the reasons here. I thought COALESCE
was the slow one in WHERE
and I tried selecting and HAVING
it first but it didn't change the performance at all.
But how is this query different from my first attempt where I selected the same thing in a subquery? It seems like the same thing - a query selects 1200 rows and 570 of those are filtered out according to the country constraint.
Upvotes: 0
Views: 61
Reputation: 2254
I am not a MySQL guru but the issue is solved and here is what I think (based on some EXPLAIN
ing by mysql) was going on. The problem seems to that both of the problematic conditions
AND COALESCE(parents.prop5, grandparents.prop5) = 8
AND users.country_id IN (123)
are at the most distant ends of the joins. MySQL always decided to start working with the users
table making it build a huge temporary table and only applying COALESCE
on the final joined temporary table.
If AND users.country_id IN (123)
is removed, MySQL can COALESCE(parents.prop5, grandparents.prop5)
right away, test that condition, join the other tables and throw out the output. If AND COALESCE(parents.prop5, grandparents.prop5) = 8
is removed, there is no need to make a temporary table for filtering - all the WHERE
conditions can be applied separately before joining it all.
The following queries that I tried are different declarations for the same thing. I didn't know that MySQL will notice that and execute them in the very same order.
SELECT item.prop1, item.prop2,
COALESCE(item.prop3, parents.prop3, grandparents.prop3) AS p3
FROM items
INNER JOIN parents ON item.parent_id = parents.id
INNER JOIN grandparents ON parents.grandparent_id = grandparents.id
INNER JOIN pivot ON pivot.item_id = items.id
INNER JOIN users ON pivot.user_id = users.id
WHERE
items.prop4 IS NULL
AND COALESCE(parents.prop5, grandparents.prop5) = 8
AND users.country_id IN (123)
SELECT item.prop1, item.prop2, p3
FROM (
SELECT item.prop1, item.prop2,
COALESCE(item.prop3, parents.prop3, grandparents.prop3) AS p3,
users.country_id as country
FROM items
INNER JOIN parents ON item.parent_id = parents.id
INNER JOIN grandparents ON parents.grandparent_id = grandparents.id
INNER JOIN pivot ON pivot.item_id = items.id
INNER JOIN users ON pivot.user_id = users.id
WHERE
items.prop4 IS NULL
AND COALESCE(parents.prop5, grandparents.prop5) = 8
) AS temp
WHERE temp.country IN (123)
The third query once again starts with filtering users
and then joins everything together to apply COALESCE
when everything is read.
SELECT item.prop1, item.prop2,
COALESCE(item.prop3, parents.prop3, grandparents.prop3) AS p3
FROM items
INNER JOIN parents ON item.parent_id = parents.id
INNER JOIN grandparents ON parents.grandparent_id = grandparents.id
INNER JOIN (
SELECT pivot.item_id as item
FROM pivot
INNER JOIN users ON pivot.user_id = users.id
WHERE users.country_id IN (123)
) as country_items ON country_items.item = items.id
WHERE
items.prop4 IS NULL
AND COALESCE(parents.prop5, grandparents.prop5) = 8
This query finally makes it start from the fat end: join a couple of tables in memory, apply coalesce, then take only the necessary rows (passing constraints) in the following joins.
SELECT item.prop1, item.prop2,
COALESCE(item.prop3, parents.prop3, grandparents.prop3) AS p3, users.country_id as country
FROM items
INNER JOIN parents FORCE INDEX (parents_grandparent_id) ON item.parent_id = parents.id
INNER JOIN grandparents ON parents.grandparent_id = grandparents.id
INNER JOIN pivot ON pivot.item_id = items.id
INNER JOIN users ON pivot.user_id = users.id
WHERE
items.prop4 IS NULL
AND COALESCE(parents.prop5, grandparents.prop5) = 8
Upvotes: 0
Reputation: 1362
Using WHERE forces the query engine to perform additional lookups to the table's fields whereas a query without a WHERE would simply return the results.
In order to make this query run faster, you probably need to add indexes to your table.
First of all run the same query by adding EXPLAIN
before your SELECT
. This will give you a nice overview of how many rows were examined, what's the key cardinality etc.
If this is a standard query you're using, I suggest you add an index for all 3 columns. If you plan to query the columns separately, you can add an index to each column you need to query, but don't overdo it as the table will become slower.
If you run EXPLAIN
after you've added the indexes you'd get a significantly lower count of rows examined I guess.
If your table already has indexes, you can hint the MySQL server by using a USE INDEX
hint.
Upvotes: 1