Reputation: 13
We have a database with a data table in it defined as follows:
CREATE TABLE data
(
msts BIGINT,
variable_id INT,
p_id INT,
value DOUBLE,
PRIMARY KEY(msts,variable_id,p_id)
);
ALTER TABLE data
ADD FOREIGN KEY (p_id)
REFERENCES p(id);
ALTER TABLE data
ADD FOREIGN KEY (variable_id)
REFERENCES variables(id);
This table can contain billions of records.
When doing a simple query:
SELECT COUNT(msts) from data FORCE INDEX(PRIMARY) where (
msts<1535886000000000000 AND msts>1535796060000000000 AND
variable_id=107 AND p_id=661 );
yeilding:
+-------------+
| COUNT(msts) |
+-------------+
| 89873 |
+-------------+
1 row in set (42.51 sec)
takes 42.51 seconds to count 89873.
Why should it take so long seeing as the primary key should act as a composite index?
This is the explain:
EXPLAIN SELECT COUNT(msts) from data FORCE INDEX(PRIMARY) where
( msts<1535886000000000000 AND msts>1535796060000000000 AND
variable_id=107 AND plant_id=661 );
which gives:
rows = 190996998
filtered=0
ref=NULL
type=range
Any help would be much appreciated!
Upvotes: 1
Views: 242
Reputation: 108806
Your query, rewritten to change the order of the where clauses, is this.
SELECT COUNT(msts)
from data
where variable_id=107
and p_id=661
and msts>1535796060000000000
and msts<1535886000000000000;
It contains two equality matches, on variable_id and p_id. It then contains a range filter on msts. Therefore, you need an index on the (variable_id, p_id, msts)
columns in that order to help satisfy your query quickly.
Why? You can think of MySQL indexes as sorted in order. To satisfy your query, MySQL random-accesses the index to the first eligible item. It then scans it sequentially until the last item. That's called an index range scan.
Your pre-existing index lists msts first. That means your index can't be scanned sequentially, because each msts value in the index potentially has lots of values of the other two columns.
Pro tip 1: Use COUNT(*)
when you can instead of COUNT(column)
. The second one is slower because it must omit any column values that are NULL. The first one just counts them all.
Pro tip 2: Extra single column indexes are not useful unless they help speed up a specific query.
Pro tip 3: Forcing the use of an index is almost always a bad choice.
Pro tip 4: Read https://use-the-index-luke.com/
Edit: You asked how to do the conversion.
If your table does not yet contain many millions of rows, just change the primary key definition like this.
ALTER TABLE data
DROP PRIMARY KEY,
ADD PRIMARY KEY (variable_id, p_id, msts);
If it does contain billions of rows already, you probably should create a new table, with the correct definition, into which to copy your existing table. Then copy your data. Then rename the old table to data_old
or something, and rename the new table to data
. This may be a complex task involving batches of data; if you can't figure it out ask another question.
Upvotes: 3