Reputation: 48725
Suppose I have simple table like this:
ID (PRIMARY)
time (INT)
stage (TINYINT)
other fields...
I have to do range search on time while normally selecting stage. With SQL query example:
SELECT * FROM table WHERE time>10000 AND (stage=1 OR stage=3 OR stage=4)
VERY IMPORTANT: There are a lot of rows with stage = 2, let's say 99% of the table. There are only 5 distinct stage values.
What would be the propper indexing of this table?
Upvotes: 0
Views: 79
Reputation: 47038
It depends on the distribution of the values in the different columns.
If you have very few possible stage
values you will probably have the best performance with either a separate index on time
and one on stage
or with a combined index time, stage
.
But if you have lots of distinct stage
values it might be faster to order the index the other way around: stage, time
.
But using OR
makes the stage
search more fragmented compared to an AND
search. Therefore I would try to have time
first in the index.
The only way to know for sure on your specific set of data is to try and measure, but above three mentioned candidates are my top candidates for indexes.
Edit
You might want to create a clustered index on time
, possibly time, stage
if most of your queries search by time range. This way you minimize lookup in the table once you found the correct rows in the index.
Beware that this can create a fragmented dataspace if time
is strictly increasing when you insert new records.
Upvotes: 1
Reputation: 484
Create an index with time first and stage second if there's a low cardinality in the stage-column. You can also change the stage part to AND stage IN (2, 10)
might improve performance as well as readability. :)
Good luck!
Upvotes: 1