Reputation: 128317
I realize that the answer here is probably some form of "it depends"; but I'm still wondering what (if any) the general rule of thumb is here.
If I want to enable "soft-deleting" records of a certain type, I might add a deleted
column so I can do something like this (providing a Rails example, but I don't mean for the question to be Rails-specific):
class SomeModel < ActiveRecord::Base
default_scope { where(:deleted => false) }
end
In this case I would imagine that indexing the deleted
column would be relatively cheap as there are only two possible values: true
or false
.
Of course it might be useful to know when a record was deleted; so I could go with deleted_at
instead:
class SomeModel < ActiveRecord::Base
default_scope { where(:deleted_at => nil) }
end
I've definitely seen this sort of thing quite a bit in the wild. My question is whether the indexing is actually much more expensive in this case, since basically every record will have a different value for deleted_at
*. I know almost zero about database internals (something I definitely would like to read up on some day); so I recognize that this is probably a naive question.
And I know it probably depends on the database as well; if it matters, I'm mostly concerned with PostgreSQL. But I would imagine there are others who would also like to know about MySQL, Oracle, etc.
So? Little help?
* As I typed that sentence I think I realized the answer, since obviously every record has a different value for id
as well but you don't think twice about indexing that. So probably date/time values work the same way? Anyway, I'd prefer an answer from someone with real knowledge on the subject rather than my hunch :)
Upvotes: 0
Views: 108
Reputation: 7284
The high over view of How database index works could be seen here.
Generally When designing indexes,
deleted
field in the question, or the classical sample of gender
filed that contains two distinct values: male or female. One advantage of bitmap indexes is that a number of them can be ANDed and ORed together to answer queries very efficiently.
The Working of indexes in postgresql and some guidance on index strategies, may come in handy.
Upvotes: 2
Reputation: 52346
Indexing the deleted_at values would only be useful if you had a fairly frequent need to identify records that were deleted in a specific time range. If you do not need that ability -- if you just wanted to know when records identified by some other means were deleted, for example -- then it would not be worth indexing.
Of course once a record is deleted you'd probably expect it to never be modified again, so you could in practice use updated_at as the deletion timestamp.
If you want to identify non-deleted records quickly then if your RDBMS supports it a partial index would be appropriate. Syntax varies by system, but in brief you'd be looking to include in your indexes only the rows for which deleted_at is not null (or is_deleted => false).
Upvotes: 1
Reputation: 8706
The size of the index is mainly influenced by two things:
Besides that, there is a overhead involved which you can generally not influence (e.g., the B-tree itself, references to the primary table store).
Answering your question: yes, a binary index might be smaller and faster, but not because of the number of distinct value, but just because one binary field needs less space than a date.
If you want to learn about SQL indexing, have a look at my site http://use-the-index-luke.com/
Upvotes: 1