Reputation: 33
I am working with a table with a "state" column, which typically holds only 2 or 3 different values. Sometimes, when this table holds several million rows, following SQL statement becomes slow (I assume a full table scan is done):
SELECT state, count(*) FROM mytable GROUP BY state
I expect to get something like this:
disabled | 500000
enabled | 2000000
(basically I want to know how many items are "enabled" and how many items are "disabled" - actually that's a number instead of a text in my real application)
I guess adding an index for my state column is pretty useless, since only very few different values can be found there. What other options do I have?
There is also a "timestamp" column (with an index). Ideally the solution should also work well if I add:
WHERE timestamp BETWEEN x AND y
Right now I'm using an SQLite3 database, but it looks like other database engines are not too different, so solutions for other DB engines might be interesting as well.
Thank you!
Upvotes: 3
Views: 1638
Reputation: 73236
I would put a covering index on timestamp,state (in that order). The rationale is:
the condition on the timestamp will be much more selective than the state
if the state is still in the index (i.e covering index), the engine only has to generate a range scan on the index itself (without having to pay for random I/Os to access the main data of the table).
Note: if the timestamp range is too wide, it will become slow despite of the index. Because random I/Os are more expensive than sequential I/Os, there is a point where the index range scan will become more expensive than the table scan. As a rule of thumb, if you need to scan more than 10% of the table, the engine should consider to keep the table scan and ignore the index. I'm note sure sqlite is smart enough to support this kind of optimization though.
Upvotes: 2