What are the perf implications of indexing a datetime column?

I realize that the answer here is probably some form of "it depends"; but I'm still wondering what (if any) the general rule of thumb is here.

If I want to enable "soft-deleting" records of a certain type, I might add a deleted column so I can do something like this (providing a Rails example, but I don't mean for the question to be Rails-specific):

class SomeModel < ActiveRecord::Base
  default_scope { where(:deleted => false) }
end

In this case I would imagine that indexing the deleted column would be relatively cheap as there are only two possible values: true or false.

Of course it might be useful to know when a record was deleted; so I could go with deleted_at instead:

class SomeModel < ActiveRecord::Base
  default_scope { where(:deleted_at => nil) }
end

I've definitely seen this sort of thing quite a bit in the wild. My question is whether the indexing is actually much more expensive in this case, since basically every record will have a different value for deleted_at*. I know almost zero about database internals (something I definitely would like to read up on some day); so I recognize that this is probably a naive question.

And I know it probably depends on the database as well; if it matters, I'm mostly concerned with PostgreSQL. But I would imagine there are others who would also like to know about MySQL, Oracle, etc.

So? Little help?

^{* As I typed that sentence I think I realized the answer, since obviously every record has a different value for id as well but you don't think twice about indexing that. So probably date/time values work the same way? Anyway, I'd prefer an answer from someone with real knowledge on the subject rather than my hunch :)}

Upvotes: 0

Answers (3)

Mohsen Heydari

Reputation: 7284

The high over view of How database index works could be seen here.

Generally When designing indexes,

Tree-based indexing(default strategy) is used to high-cardinality columns when values of the column repeats rarely AKA forgien key columns.
Bitmap indexes have traditionally been considered to work well for low-cardinality columns when values of the column repeats very frequently like deleted field in the question, or the classical sample of gender filed that contains two distinct values: male or female.

One advantage of bitmap indexes is that a number of them can be ANDed and ORed together to answer queries very efficiently.

The Working of indexes in postgresql and some guidance on index strategies, may come in handy.

Upvotes: 2

David Aldridge

Reputation: 52346

Indexing the deleted_at values would only be useful if you had a fairly frequent need to identify records that were deleted in a specific time range. If you do not need that ability -- if you just wanted to know when records identified by some other means were deleted, for example -- then it would not be worth indexing.

Of course once a record is deleted you'd probably expect it to never be modified again, so you could in practice use updated_at as the deletion timestamp.

If you want to identify non-deleted records quickly then if your RDBMS supports it a partial index would be appropriate. Syntax varies by system, but in brief you'd be looking to include in your indexes only the rows for which deleted_at is not null (or is_deleted => false).

Upvotes: 1

Markus Winand

Reputation: 8706

The size of the index is mainly influenced by two things:

The index key length (The number of bytes the each key value occupies).
The number of index rows (obviously)

Besides that, there is a overhead involved which you can generally not influence (e.g., the B-tree itself, references to the primary table store).

Answering your question: yes, a binary index might be smaller and faster, but not because of the number of distinct value, but just because one binary field needs less space than a date.

If you want to learn about SQL indexing, have a look at my site http://use-the-index-luke.com/

Upvotes: 1

What are the perf implications of indexing a datetime column?

Answers (3)

Related Questions