Reputation: 12405
I have some tables with boolean column, e.g. an active column to indicate whether a user is active or not, then to query all inactive user I use
select * from users where active = 0
Most of my users are active. I understand without help the query needs to check each record, which is low efficiency and rather slow in my case. I have other table with boolean column, say a processed column to record whether an orders has been processed or not and query all unprocessed orders,
select * from orders where processed = 0
I am considering adding a helper table to record those inactive users and unprocessed orders like,
CREATE TABLE IF NOT EXISTS failedRecord (tablename text, row integer) ; //row will be rowid
I really don't like this home-made solution. I will prefer using the solution database provides but I am not sure whether using index on a boolean column will help or not. B/C I think indexes are implemented by creating a separate index table which maps a key created from the column to a row index in the indexed table. For the boolean column as the value can only be 0 or 1 I thought the mapping won't be efficient.
I use sqlite but I assume other database will have the problem as well.
Updated my question.
Most of my users are active and most of orders are processed, i.e. in my case here only a few of rows are 0, so after second thought index may be efficient. Is this the case?
Upvotes: 1
Views: 610
Reputation: 12405
I find this question was asked and answered at at sqlite mailing list Index on BOOLEAN field. I hope they are right.
To quote, "If all possible values are equally distributed, and you frequently look for a particular value, the index will help even if you have only two possible values. If you have almost all '2011' rows, and you're searching for '2011', then the index won't help much...
Actually, the break-even point is roughly 1/10: an index helps if you are selecting 10% or fewer of the records in the table; otherwise, a linear scan is faster."
"This may help if and only if a) you have many more records with FLAG=1 than with FLAG=0 (or vice versa); and b) most of the time, you are looking up the records belonging to the small subset. For example, if there's a small number of "active" or recent records that need to be processed, and a large archive of "processed" records."
Upvotes: 1
Reputation: 180080
If most users are active, or most orders are processed, then very few rows match, and you can speed up the second query by indexing the processed
column.
(If you have SQLite 3.8.0 or newer, you can avoid indexing the processed orders by using a partial index.)
Using an index is much faster and more maintainable than creating a helper table manually.
If you were using the reverse condition (so that most of the rows would match), you would have to access almost all pages of the table anyway. In that cae, there would be no mechanism that would be faster than reading all pages in order.
Upvotes: 0