Yevgeny Simkin
Yevgeny Simkin

Reputation: 28439

Does it improve performance to index a date column?

I have a table with millions of rows where one of the columns is a TIMESTAMP and against which I frequently select for date ranges. Would it improve performance any to index that column, or would that not furnish any notable improvement?

EDIT:

So, I've indexed the TIMESTAMP column. The following query select count(*) from interactions where date(interaction_time) between date('2013-10-10') and date(now())

Takes 3.1 seconds.

There are just over 3 million records in the interactions table.

The above query produces a result of ~976k

Does this seem like a reasonable amount of time to perform this task?

Upvotes: 27

Views: 53512

Answers (5)

vkrishna17
vkrishna17

Reputation: 936

Adding an index on date column definitely increases performance.

My table has 11 million rows, and a query to fetch rows which were updated on a particular date took the following time according to conditions:

Without index: ~2.5s

With index: ~5ms

Upvotes: 1

vkiwi
vkiwi

Reputation: 204

The general rule with indexes is they speed retrieval of data with large data sets, but SLOW the insertion and update of records.

If you have millions of rows, and need to select a small subset of them, then an index most likely will improve performance when doing a SELECT. (If you need most or all of them if will make little or no difference.)

Without an index, a table scan (ie read of every record to locate required ones) will occur which can be slow. With tables with only a few records, a table scan can actually be faster than an index, but this is not your situation. Another consideration is how many discrete values you have. If you only have a handful of different dates, indexing probably won't help much if at all, however if you have a wide range of dates the index will most likely help. One caveat, if the index is very big and won't fit in memory, you may not get the performance benefits you might hope for. Also you need to consider what other fields you are retrieving, joins etc, as they all have an impact. A good way to check how performance is impacted is to use the EXPLAIN statement to see how mySQL will execute the query.

Upvotes: 6

ypercubeᵀᴹ
ypercubeᵀᴹ

Reputation: 115660

If you want improvement on the efficiency of queries, you need 2 things:

  • First, index the column.

  • Second, and this is more important, make sure the conditions on your queries are sargable, i.e. that indexes can be used. In particular, functions should not be used on the columns. In your example, one way to write the condition would be:

    WHERE interaction_time >= '2013-10-10' 
      AND interaction_time < (CURRENT_DATE + INTERVAL 1 DAY)
    

Upvotes: 42

zerkms
zerkms

Reputation: 255155

For

select count(*) from interactions where date(interaction_time) between date('2013-10-10') and date(now())

query to be optimized you need to do the following:

  1. Use just interaction_time instead of date(interaction_time)
  2. Create an index that covers interaction_time column
  3. (optional) Use just '2013-10-10' not date('2013-10-10')

You need #1 because indexes are only used if the columns are used in comparisons as-is, not as arguments in another expressions.

Upvotes: 3

Bohemian
Bohemian

Reputation: 425438

It would improve performance if:

  • there are at least "several" different values
  • your query uses a date range that would select less than "most" of the rows

To find out for sure, use EXPLAIN to show what index is being used. Use explain before creating the index and again after - you should see that the new index is being used or not. If its being used, you can be confident performance is better.

You can also simply compare query timings.

Upvotes: 5

Related Questions