Why does adding a WHERE statement (on a column with an index) to my query increase my run time from seconds to minutes?

Question

My problem is with this query in MySQL:

select 
    SUM(OrderThreshold < @LOW_COST) as LOW_COUNT,
    SUM(OrderThreshold > @HIGH_COST) as HIGH_COUNT
FROM parts
-- where parttypeid = 1

When the where is uncommented, my run time jumps for 4.5 seconds to 341 seconds. There are approximately 21M total records in this table.

My EXPLAIN looks like this, which seems to indicate that it is utilizing the INDEX I have on PartTypeId.

id  select_type table   type    possible_keys   key         key_len ref rows    Extra
1   SIMPLE      parts   ref     PartTypeId      PartTypeId  1       const       11090057

I created my table using this query:

CREATE TABLE IF NOT EXISTS parts (
    Id INTEGER NOT NULL PRIMARY KEY, 
    PartTypeId TINYINT NOT NULL, 
    OrderThreshold INTEGER NOT NULL, 
    PartName VARCHAR(500), 
    INDEX(Id),
    INDEX(PartTypeId),
    INDEX(OrderThreshold),
);

The query with out the WHERE returns

LOW_COUNT   HIGH_COUNT
3570        3584

With the where the results look like this:

LOW_COUNT   HIGH_COUNT
2791        2147

How can I improve the performance of my query to keep the run time down in the seconds (instead of minutes) range when adding a where statement that only looks at one column?

Gordon Linoff · Accepted Answer

Your accepted answer doesn't explain what is going wrong with your original query:

select SUM(OrderThreshold < @LOW_COST) as LOW_COUNT,
       SUM(OrderThreshold > @HIGH_COST) as HIGH_COUNT
from parts
where parttypeid = 1;

The index is being used to find the results, but there are a lot of rows with parttypeid = 1. I am guessing that each data page probably has at least one such row. That means that all the rows are being fetched, but they are being read out-of-order. That is slower than just doing a full table scan (as in the first query). In other words, all the data pages are being read, but the index is adding additional overhead.

As Juergen points out, a better form of the query moves the conditions into the where clause:

select SUM(OrderThreshold < @LOW_COST) as LOW_COUNT,
       SUM(OrderThreshold > @HIGH_COST) as HIGH_COUNT
from parts
where parttypeid = 1 AND
      (OrderThreshold < @LOW_COST OR OrderThreshold > @HIGH_COST)

(I prefer this form, because the where conditions match the case conditions.) For this query, you want an index on parts(parttypeid, OrderThreshold). I'm not sure about the MySQL optimizer in this case, but it might be better to write as:

select 'Low' as which, count(*) as CNT
from parts
where parttypeid = 1 AND
      OrderThreshold < @LOW_COST
union all
select 'High', count(*) as CNT
from parts
where parttypeid = 1 AND
      OrderThreshold > @HIGH_COST;

Each subquery should definitely use the index in this case. (If you want them in one row with two columns, there are a couple ways to achieve that, but I'm guessing that is not so important.)

Unfortunately, the best index for your query without the where clause is parts(OrderThreshold). This is a different index from the above.

Why does adding a WHERE statement (on a column with an index) to my query increase my run time from seconds to minutes?

Answers (2)

Related Questions