Jack
Jack

Reputation: 1

MySQL table with 250M+ rows and slow query speed

I have a table with columns like this:

| seqid          | bigint(20) | NO   | PRI | 0                 |
| Time           | timestamp  | NO   | PRI | CURRENT_TIMESTAMP |
| DevId          | text       | YES  | MUL | NULL              |

The table has partitioning enabled by value of Time column, and index length of DevId covers full length of every possible value.

Right now the table has 250M+ rows (may grow to 400M+), and ~18K to ~20K distinct DevIds.

When I try to run a query like:

select * from TABLE where DevId='00:1F:23:31:44:48'

It took 30 to 90 seconds for query, and another 30 to 90 seconds to fetch the ~20000 records. The output of explain shows something like this:

+----+-------------+------+-------------+---------+-------+------+-------------+
| id | select_type | type | key         | key_len | ref   | rows | Extra       |
+----+-------------+------+-------------+---------+-------+------+-------------+
|  1 | SIMPLE      | ref  | DevID_IDX   | 387     | const |21042 | Using where |
+----+-------------+------+-------------+---------+-------+------+-------------+

There are several things make me wonder:

Thanks in advance for any suggestion.

Upvotes: 0

Views: 1248

Answers (2)

Jack
Jack

Reputation: 11

thanks for all your replies. yes DevID is a MAC address for this case, but it can be anything else. change it to varchar() did help a little, but not too much, since in latest MySQL, short text column actually used almost the same amount of space as varchar().

after intensive investigation, actually I found the root cause by myself, and it's seldom mentioned elsewhere. I would like to share what I learned here, and also ask for everyone's opinion.

as everyone knows, mysql can utilize only single index for this query. the fetch time is long because mysql stores data on primary index leaf nodes, which in my case is a random number and a timestamp. the timestamp needs to be included because I need to partition the table. this means row data of the same dev id cannot be stored sequentially, which leads to huge performance limitation. according to my test, use integer to store dev id did help a bit for index loading, but it doesn't help row data fetching, which is, unfortunately, the real cause of slowness.

however, for my application, it's rare to query a long period of data for the same dev id. for the most of time, it would be less than one day, between 1000 to 2000 rows. if mysql only needs to access these rows, that's actually quite fast. the problem is, if I run a plain query like:

select * from TABLE where DevId='00:1F:23:31:44:48' and Time <> (T1, T2);

with 400M+ rows in the table, mysql will pick time index, which is actually a bad guess because it often means a range scan of 100K+ rows, sometimes up to 10M rows. even while dev id index is chosen, on the other hand, mysql will do a range scan and filter by Time column values, which doesn't help a lot as well. I hoped mysql can be smart enough to utilize the covering index technique - since dev id index (the secondary index) contains dev id and primary key (in my case it's the random number and Time), mysql should be able to know which rows to fetch only by looking at secondary index.

but the ugly reality is, it doesn't. so it turns out I need to use a small join like:

select * from TABLE as a1 join
    (select seqid from TABLE use index (DevIDIndex) 
     where DevID='....' and Time <> (T1,T2)) as a2
on a1.seqid=a2.seqid

for one day of data, it loads in several seconds, down from minutes.

Upvotes: 1

MajidTaheri
MajidTaheri

Reputation: 3983

your field type is text(medium).this datatype is slow I/O operations.please to char/varchar with collation(ASCII)

Upvotes: 0

Related Questions