Reputation: 28439
I have a table with millions of rows where one of the columns is a TIMESTAMP and against which I frequently select for date ranges. Would it improve performance any to index that column, or would that not furnish any notable improvement?
EDIT:
So, I've indexed the TIMESTAMP column. The following query
select count(*) from interactions where date(interaction_time) between date('2013-10-10') and date(now())
Takes 3.1 seconds.
There are just over 3 million records in the interactions table.
The above query produces a result of ~976k
Does this seem like a reasonable amount of time to perform this task?
Upvotes: 27
Views: 53512
Reputation: 936
Adding an index on date column definitely increases performance.
My table has 11 million rows, and a query to fetch rows which were updated on a particular date took the following time according to conditions:
Without index: ~2.5s
With index: ~5ms
Upvotes: 1
Reputation: 204
The general rule with indexes is they speed retrieval of data with large data sets, but SLOW the insertion and update of records.
If you have millions of rows, and need to select a small subset of them, then an index most likely will improve performance when doing a SELECT. (If you need most or all of them if will make little or no difference.)
Without an index, a table scan (ie read of every record to locate required ones) will occur which can be slow. With tables with only a few records, a table scan can actually be faster than an index, but this is not your situation. Another consideration is how many discrete values you have. If you only have a handful of different dates, indexing probably won't help much if at all, however if you have a wide range of dates the index will most likely help. One caveat, if the index is very big and won't fit in memory, you may not get the performance benefits you might hope for. Also you need to consider what other fields you are retrieving, joins etc, as they all have an impact. A good way to check how performance is impacted is to use the EXPLAIN statement to see how mySQL will execute the query.
Upvotes: 6
Reputation: 115660
If you want improvement on the efficiency of queries, you need 2 things:
First, index the column.
Second, and this is more important, make sure the conditions on your queries are sargable, i.e. that indexes can be used. In particular, functions should not be used on the columns. In your example, one way to write the condition would be:
WHERE interaction_time >= '2013-10-10'
AND interaction_time < (CURRENT_DATE + INTERVAL 1 DAY)
Upvotes: 42
Reputation: 255155
For
select count(*) from interactions where date(interaction_time) between date('2013-10-10') and date(now())
query to be optimized you need to do the following:
interaction_time
instead of date(interaction_time)
interaction_time
column'2013-10-10'
not date('2013-10-10')
You need #1 because indexes are only used if the columns are used in comparisons as-is, not as arguments in another expressions.
Upvotes: 3
Reputation: 425438
It would improve performance if:
To find out for sure, use EXPLAIN
to show what index is being used. Use explain before creating the index and again after - you should see that the new index is being used or not. If its being used, you can be confident performance is better.
You can also simply compare query timings.
Upvotes: 5