Džuris
Džuris

Reputation: 2254

Why does removing any of the wheres makes the query fast?

I will present a simplified version of my query:

SELECT item.prop1, item.prop2,
       COALESCE(item.prop3, parents.prop3, grandparents.prop3) AS p3
FROM items
  INNER JOIN parents ON item.parent_id = parents.id
  INNER JOIN grandparents ON parents.grandparent_id = grandparents.id
  INNER JOIN pivot ON pivot.item_id = items.id
  INNER JOIN users ON pivot.user_id = users.id
WHERE
  items.prop4 IS NULL
  AND COALESCE(parents.prop5, grandparents.prop5) = 8
  AND users.country_id IN (123)

This query is a bit slower than I can afford - it runs about 0.7 seconds.

While trying to optimize, I noticed that removing any of the last two lines make it run in 0.01-0.02 seconds.

I tried to split the filtering in two steps:

SELECT item.prop1, item.prop2, p3
FROM (
    SELECT item.prop1, item.prop2,
           COALESCE(item.prop3, parents.prop3, grandparents.prop3) AS p3,
           users.country_id as country
    FROM items
      INNER JOIN parents ON item.parent_id = parents.id
      INNER JOIN grandparents ON parents.grandparent_id = grandparents.id
      INNER JOIN pivot ON pivot.item_id = items.id
      INNER JOIN users ON pivot.user_id = users.id
    WHERE
      items.prop4 IS NULL
      AND COALESCE(parents.prop5, grandparents.prop5) = 8
) AS temp
WHERE temp.country IN (123)

This made no difference in the total running time. When I tried to execute only the subquery, it completed in about 0.01 seconds and return about 1200 rows. I would expect that filtering by these 1200 rows down to 570 by a single column shouldn't cost me half a second, should it?

I also tried another hierarchy like this:

SELECT item.prop1, item.prop2,
       COALESCE(item.prop3, parents.prop3, grandparents.prop3) AS p3
FROM items
  INNER JOIN parents ON item.parent_id = parents.id
  INNER JOIN grandparents ON parents.grandparent_id = grandparents.id
  INNER JOIN (
    SELECT pivot.item_id as item
    FROM pivot
      INNER JOIN users ON pivot.user_id = users.id
    WHERE users.country_id IN (123)
  ) as country_items ON country_items.item = items.id
WHERE
  items.prop4 IS NULL
  AND COALESCE(parents.prop5, grandparents.prop5) = 8

Once again the subquery is fast (0.005 seconds, returns about 200000 rows) but whole thing together runs really slow.

What could cause such performance dip? I am almost leaning to involve my application to execute the last statement in two steps...

Working solution

Selecting this constraint and employing HAVING solves the issue - this query completes in 0.015 seconds.

SELECT item.prop1, item.prop2,
       COALESCE(item.prop3, parents.prop3, grandparents.prop3) AS p3, users.country_id as country
FROM items
  INNER JOIN parents ON item.parent_id = parents.id
  INNER JOIN grandparents ON parents.grandparent_id = grandparents.id
  INNER JOIN pivot ON pivot.item_id = items.id
  INNER JOIN users ON pivot.user_id = users.id
WHERE
  items.prop4 IS NULL
  AND COALESCE(parents.prop5, grandparents.prop5) = 8
HAVING country IN (123)

I still do not understand the reasons here. I thought COALESCE was the slow one in WHERE and I tried selecting and HAVING it first but it didn't change the performance at all.

But how is this query different from my first attempt where I selected the same thing in a subquery? It seems like the same thing - a query selects 1200 rows and 570 of those are filtered out according to the country constraint.

Upvotes: 0

Views: 61

Answers (2)

Džuris
Džuris

Reputation: 2254

I am not a MySQL guru but the issue is solved and here is what I think (based on some EXPLAINing by mysql) was going on. The problem seems to that both of the problematic conditions

  AND COALESCE(parents.prop5, grandparents.prop5) = 8
  AND users.country_id IN (123)

are at the most distant ends of the joins. MySQL always decided to start working with the users table making it build a huge temporary table and only applying COALESCE on the final joined temporary table.

If AND users.country_id IN (123) is removed, MySQL can COALESCE(parents.prop5, grandparents.prop5) right away, test that condition, join the other tables and throw out the output. If AND COALESCE(parents.prop5, grandparents.prop5) = 8 is removed, there is no need to make a temporary table for filtering - all the WHERE conditions can be applied separately before joining it all.

The following queries that I tried are different declarations for the same thing. I didn't know that MySQL will notice that and execute them in the very same order.

SELECT item.prop1, item.prop2,
       COALESCE(item.prop3, parents.prop3, grandparents.prop3) AS p3
FROM items
  INNER JOIN parents ON item.parent_id = parents.id
  INNER JOIN grandparents ON parents.grandparent_id = grandparents.id
  INNER JOIN pivot ON pivot.item_id = items.id
  INNER JOIN users ON pivot.user_id = users.id
WHERE
  items.prop4 IS NULL
  AND COALESCE(parents.prop5, grandparents.prop5) = 8
  AND users.country_id IN (123)

SELECT item.prop1, item.prop2, p3
FROM (
    SELECT item.prop1, item.prop2,
           COALESCE(item.prop3, parents.prop3, grandparents.prop3) AS p3,
           users.country_id as country
    FROM items
      INNER JOIN parents ON item.parent_id = parents.id
      INNER JOIN grandparents ON parents.grandparent_id = grandparents.id
      INNER JOIN pivot ON pivot.item_id = items.id
      INNER JOIN users ON pivot.user_id = users.id
    WHERE
      items.prop4 IS NULL
      AND COALESCE(parents.prop5, grandparents.prop5) = 8
) AS temp
WHERE temp.country IN (123)

The third query once again starts with filtering users and then joins everything together to apply COALESCE when everything is read.

SELECT item.prop1, item.prop2,
       COALESCE(item.prop3, parents.prop3, grandparents.prop3) AS p3
FROM items
  INNER JOIN parents ON item.parent_id = parents.id
  INNER JOIN grandparents ON parents.grandparent_id = grandparents.id
  INNER JOIN (
    SELECT pivot.item_id as item
    FROM pivot
      INNER JOIN users ON pivot.user_id = users.id
    WHERE users.country_id IN (123)
  ) as country_items ON country_items.item = items.id
WHERE
  items.prop4 IS NULL
  AND COALESCE(parents.prop5, grandparents.prop5) = 8

The solution

This query finally makes it start from the fat end: join a couple of tables in memory, apply coalesce, then take only the necessary rows (passing constraints) in the following joins.

SELECT item.prop1, item.prop2,
       COALESCE(item.prop3, parents.prop3, grandparents.prop3) AS p3, users.country_id as country
FROM items
  INNER JOIN parents FORCE INDEX (parents_grandparent_id) ON item.parent_id = parents.id
  INNER JOIN grandparents ON parents.grandparent_id = grandparents.id
  INNER JOIN pivot ON pivot.item_id = items.id
  INNER JOIN users ON pivot.user_id = users.id
WHERE
  items.prop4 IS NULL
  AND COALESCE(parents.prop5, grandparents.prop5) = 8

Upvotes: 0

Fotis
Fotis

Reputation: 1362

Using WHERE forces the query engine to perform additional lookups to the table's fields whereas a query without a WHERE would simply return the results.

In order to make this query run faster, you probably need to add indexes to your table.

First of all run the same query by adding EXPLAIN before your SELECT. This will give you a nice overview of how many rows were examined, what's the key cardinality etc.

If this is a standard query you're using, I suggest you add an index for all 3 columns. If you plan to query the columns separately, you can add an index to each column you need to query, but don't overdo it as the table will become slower.

If you run EXPLAIN after you've added the indexes you'd get a significantly lower count of rows examined I guess.

If your table already has indexes, you can hint the MySQL server by using a USE INDEX hint.

Upvotes: 1

Related Questions