Reputation: 1
I have a table with columns like this:
| seqid | bigint(20) | NO | PRI | 0 |
| Time | timestamp | NO | PRI | CURRENT_TIMESTAMP |
| DevId | text | YES | MUL | NULL |
The table has partitioning enabled by value of Time column, and index length of DevId covers full length of every possible value.
Right now the table has 250M+ rows (may grow to 400M+), and ~18K to ~20K distinct DevIds.
When I try to run a query like:
select * from TABLE where DevId='00:1F:23:31:44:48'
It took 30 to 90 seconds for query, and another 30 to 90 seconds to fetch the ~20000 records. The output of explain shows something like this:
+----+-------------+------+-------------+---------+-------+------+-------------+
| id | select_type | type | key | key_len | ref | rows | Extra |
+----+-------------+------+-------------+---------+-------+------+-------------+
| 1 | SIMPLE | ref | DevID_IDX | 387 | const |21042 | Using where |
+----+-------------+------+-------------+---------+-------+------+-------------+
There are several things make me wonder:
Why the key_len is 387? I know MySQL uses more bytes for UTF-8 encoded tables, but 387 is way too long for the values. (All of them are 17 digits)
Why MySQL needs 30 to 90 seconds to fetch the index records? I know MySQL has a 16KB page size and secondary index records are stored in the leaf node of BTree and each page may be 1/2 to 15/16 full only. That means it may needs to seek for 30 or 40 pages from disk. 90 seconds definitely looks too long for that.
Is innodb one file per table option going to help? Any other way to improve the search speed? Our goal was several seconds for single query with device id and time constraint.
Thanks in advance for any suggestion.
Upvotes: 0
Views: 1248
Reputation: 11
thanks for all your replies. yes DevID is a MAC address for this case, but it can be anything else. change it to varchar() did help a little, but not too much, since in latest MySQL, short text column actually used almost the same amount of space as varchar().
after intensive investigation, actually I found the root cause by myself, and it's seldom mentioned elsewhere. I would like to share what I learned here, and also ask for everyone's opinion.
as everyone knows, mysql can utilize only single index for this query. the fetch time is long because mysql stores data on primary index leaf nodes, which in my case is a random number and a timestamp. the timestamp needs to be included because I need to partition the table. this means row data of the same dev id cannot be stored sequentially, which leads to huge performance limitation. according to my test, use integer to store dev id did help a bit for index loading, but it doesn't help row data fetching, which is, unfortunately, the real cause of slowness.
however, for my application, it's rare to query a long period of data for the same dev id. for the most of time, it would be less than one day, between 1000 to 2000 rows. if mysql only needs to access these rows, that's actually quite fast. the problem is, if I run a plain query like:
select * from TABLE where DevId='00:1F:23:31:44:48' and Time <> (T1, T2);
with 400M+ rows in the table, mysql will pick time index, which is actually a bad guess because it often means a range scan of 100K+ rows, sometimes up to 10M rows. even while dev id index is chosen, on the other hand, mysql will do a range scan and filter by Time column values, which doesn't help a lot as well. I hoped mysql can be smart enough to utilize the covering index technique - since dev id index (the secondary index) contains dev id and primary key (in my case it's the random number and Time), mysql should be able to know which rows to fetch only by looking at secondary index.
but the ugly reality is, it doesn't. so it turns out I need to use a small join like:
select * from TABLE as a1 join
(select seqid from TABLE use index (DevIDIndex)
where DevID='....' and Time <> (T1,T2)) as a2
on a1.seqid=a2.seqid
for one day of data, it loads in several seconds, down from minutes.
Upvotes: 1
Reputation: 3983
your field type is text(medium).this datatype is slow I/O operations.please to char/varchar with collation(ASCII)
Upvotes: 0