Ethan
Ethan

Reputation: 4995

Index design for queries using 2 ranges

I am trying to find out how to design the indexes for my data when my query is using ranges for 2 fields.

expenses_tbl:
idx        date     category      amount
auto-inc   INT       TINYINT      DECIMAL(7,2)
PK

The column category defines the type of expense. Like, entertainment, clothes, education, etc. The other columns are obvious.

One of my query on this table is to find all those instances where for a given date range, the expense has been more than $50. This query will look like:

SELECT date, category, amount 
FROM expenses_tbl
WHERE date > 120101 AND date < 120811 
      AND amount > 50.00;

How do I design the index/secondary index on this table for this particular query.

Assumption: The table is very large (It's not currently, but that gives me a scope to learn).

Upvotes: 5

Views: 3569

Answers (3)

Mark Byers
Mark Byers

Reputation: 838156

A couple more points that have not been mentioned yet:

The order of the columns in the index can make a difference. You may want to try both of these indexes:

(date, amount)
(amount, date)

Which to pick? Generally you want the most selective condition be the first column in the index.

  • If your date ranges are large but few expenses are over $50 then you want amount first in the index.
  • If you have narrow date ranges and most of the expenses are over $50 then you should put date first.
  • If both indexes are present then MySQL will choose the index with the lowest estimated cost.

You can try adding both indexes and then look at the output of EXPLAIN SELECT ... to see which index MySQL chooses for your query.


You may also want to consider a covering index. By including the column category in the index (as the last column) it means that all the data required for your query is available in the index, so MySQL does not need to look at the base table at all to get the results for your query.

Upvotes: 1

Joshua Martell
Joshua Martell

Reputation: 7212

MySQL generally doesn't support ranges on multiple parts of a compound index. Either it will use the index for the date, or an index for the amount, but not both. It might do an index merge if you had two indexes, one on each, but I'm not sure.

I'd check the EXPLAIN before and after adding these indexes:

CREATE INDEX date_idx ON expenses_tbl (date);
CREATE INDEX amount_idx ON expenses_tbl (amount);

Compound index ranges - http://dev.mysql.com/doc/refman/5.5/en/range-access-multi-part.html

Index Merge - http://dev.mysql.com/doc/refman/5.0/en/index-merge-optimization.html

Upvotes: 3

Gordon Linoff
Gordon Linoff

Reputation: 1269703

The general answer to your question is that you want a composite index, with two keys. The first being date and the second being the amount.

Note that this index will work for queries with restrictions on the date or on the date and on the expense. It will not work for queries with restrictions on the expense only. If you have both types, you might want a second index on expense.

If the table is really, really large, then you might want to partition it by date and build indexes on expense within each partition.

Upvotes: 0

Related Questions