user900889
user900889

Reputation: 75

How to speed up mysql select in database with highly redundant key values

I have a very simple MYSQL database with only 3 columns but several millions of rows. Two of the colums (hid1, hid2) describe study objects (about 50,000 of them) and the third column (score) is the result of a comparison of hid1 with hid2. Thus, the number of rows is max(hid1)*max(hid2), which is quite a big number. Because the table has to be written only once and read many million times, I selected a MyISAM table (I hope this was a good idea). Initially, it was planned that I would retrieve 'score' for a given pair of hid1,hid2 but it turned out to be more convenient to retrieve all scores (and hid2) for a given hid1.

My table ("result") looks like this:

+-------+-----------------------+------+-----+---------+-------+
| Field | Type                  | Null | Key | Default | Extra |
+-------+-----------------------+------+-----+---------+-------+
| hid1  | mediumint(8) unsigned | YES  | MUL | NULL    |       |
| hid2  | mediumint(8) unsigned | YES  |     | NULL    |       |
| score | float                 | YES  |     | NULL    |       |
+-------+-----------------------+------+-----+---------+-------+

and a typical query would be

select hid1,hid2,score from result where hid1=13531 into outfile "/tmp/ttt"

Here is the problem: The query just takes too long, at least sometimes. For some 'hid1' values, I get the result back in under a second. For other hid1 (particularly for big numbers), I have to wait for up to 40 sec. As I said, I have to run thousands of these queryies, so I am interested in speeding things up.

Let me reiterate: there are about 50,000 hits to the query, and I don't need them in any particular order. Am I doing something wrong here, or is a relational database like MySQL not up to this task?

What I already tried is to increase the key_buffer in /etc/mysql/my.conf this appeared to help, but not much. The index on hid1 is a few GB, does the key_buffer have to be bigger than the index size to be effective?

Any hint would be appreciated.


Edit: here is an example run with the corresponding 'explain' output:

select hid1,hid2,score from result where hid1=132885 into outfile "/tmp/ttt"
Query OK, 16465 rows affected (31.88 sec)

As you can see below, the index hid1_idx is actually being used:

mysql> explain select hid1,hid2,score from result where hid1=132885 into outfile "/tmp/ttt";
+----+-------------+--------+------+---------------+------------+---------+-------+-------+-------------+
| id | select_type | table  | type | possible_keys | key        | key_len | ref   | rows  | Extra       |
+----+-------------+--------+------+---------------+------------+---------+-------+-------+-------------+
|  1 | SIMPLE      | result | ref  | hid1_index    | hid1_index | 4       | const | 15456 | Using where |
+----+-------------+--------+------+---------------+------------+---------+-------+-------+-------------+

1 row in set (0.00 sec)

What I do find puzzling is the fact that query with low numbers for hid1 always are much faster than those with high numbers. This is not what I would expect from using an index.

Upvotes: 0

Views: 2718

Answers (3)

Ronnis
Ronnis

Reputation: 12833

Two random suggestions, based on a query pattern that always involve equality filter on hid1:

  1. Use InnoDB table instead and take advantage of a clustered index on (hid1, hid2). That way all rows belonging to the same hid will be physically located together, and this will speed up retreival.

  2. Hash-partition the table on hid1, with a suitable nr of partitions.

Upvotes: 2

Alex Siri
Alex Siri

Reputation: 2864

The simplest way to optimize a query like that, would be to use an index. A simple thing like

alter table results add index(hid1)

would improve the query you sent. Even more, if you want to search by both fields at once, you can use both fields in the index.

alter table results add index(hid1, hid2)

That way, MySQL can access results in a very organized way, and find the information you want.

If you run an explain on the first query, you might see something like

| select_type | table  | type|possible_keys| rows   |Extra
| SIMPLE      | results| ALL |             | 7765605| Using where

After adding the index, you should see

| select_type | table  | type|possible_keys| rows   |Extra
| SIMPLE      | results| ref |hid1         | 2816304|

Which is telling you, in the first case, that it needs to check ALL the rows, and in the second case, that it can find the information using a ref

Upvotes: 1

mrjink
mrjink

Reputation: 1129

If you know the combination of hid1 and hid2 is unique, you should consider making that your primary key. That will automatically also add an index to hid1. See: http://dev.mysql.com/doc/refman/5.5/en/multiple-column-indexes.html

Also, check the output of EXPLAIN. See: http://dev.mysql.com/doc/refman/5.5/en/select-optimization.html and related links.

Upvotes: 0

Related Questions