Reputation: 4928
I have a table with 200 million rows where index is created in "created_at" column which is datetime datatype.
show create table [tablename] outputs:
create table `table`
(`created_at` datetime NOT NULL)
PRIMARY KEY (`id`)
KEY `created_at_index` (`created_at`)
ENGINE=InnoDB AUTO_INCREMENT=208512112 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci'
created_at ranges from 2020-04-01 ~ 2020-05-28.
I want to get only rows that are beyond 2020-05-15 23:00:00.
when I run :
EXPLAIN SELECT created_at
FROM table
where created_at >= '2020-05-15 23:00:00';
it says it outputs:
rows Extra
200mil Using Where
My understanding is that in RDMS if no index rows are not ordered but when you create an index on a column it is in sorted order therefore right after finding '2020-05-15 23:00:00' it will simply return all rows after that.
Also since its cardinalty is 7mil I'd thought using an index would be better than full table scan.
Is it because I've inputted date as a string? but when I try
where created_at >= date('2020-05-15 23:00:00');
still the same.
and
where created_at >= datetime('2020-05-15 23:00:00');
outputs syntax error.
Did mysql just decide it would be more efficient to do a full table scan?
EDIT:
using equals
EXPLAIN SELECT created_at
FROM table
where created_at = '2020-05-15';
outputs:
key_len ref rows Extra
5 const 51
In where clause if I change string to date('2020-05-15') it outputs:
key_len ref rows Extra
5 const 51 Using index condition
does this mean that first equal query one didn't use an index?
Upvotes: 4
Views: 4644
Reputation: 142278
If the values are evenly distributed, about 25% of the rows are >= '2020-05-15 23:00:00'
Yes, Mysql will prefer a full table scan to using the index when you have such a large percentage of the table needed.
See Why does MySQL not always use index for select query?
In a DATE
context, date('2020-05-15 23:00:00')
is the same as '2020-05-15'
.
In a DATETIME
context, datetime('2020-05-15 23:00:00')
is the same as '2020-05-15 23:00:00'
.
Using index
means that the INDEX
is "covering", which means that the entire query can be performed entirely in the index's BTree -- without reaching over to the data's BTree.
Using index condition
means something quite different -- it has to do with a minor optimization relating to the two layers ("handler" and "engine") in MySQL's design. (More details in "ICP" aka "Index Condition Pushdown".)
Upvotes: 1
Reputation: 222432
All of your queries would take advantage of an index on column created_at
. MySQL always uses an index when it matches the predicate(s) of the where
clause.
The output of your explain
s do indicate that you do not have this index in place, which is confirmed by the output of your create table
.
Just create the index and your database will use it.
Here is a demo:
-- sample table, without the index
create table mytable(id int, created_at datetime);
-- the query does a full scan, as no index is available
explain select created_at from mytable where created_at >= '2020-05-15 23:00:00';
id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra -: | :---------- | :------ | :--------- | :--- | :------------ | :--- | :------ | :--- | ---: | -------: | :---------- 1 | SIMPLE | mytable | null | ALL | null | null | null | null | 1 | 100.00 | Using where
-- now add the index
create index idx_mytable_created_at on mytable(created_at);
-- the query uses the index
explain select created_at from mytable where created_at >= '2020-05-15 23:00:00';
id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra -: | :---------- | :------ | :--------- | :---- | :--------------------- | :--------------------- | :------ | :--- | ---: | -------: | :----------------------- 1 | SIMPLE | mytable | null | index | idx_mytable_created_at | idx_mytable_created_at | 6 | null | 1 | 100.00 | Using where; Using index
Upvotes: 1