Lance Bryant
Lance Bryant

Reputation: 157

Mysql query optimization Multi Column Index solves this slowness?

+----------------------------+------------------------------------------------------------------------------+------+-----+---------+----------------+
| Field                      | Type                                                                         | Null | Key | Default | Extra          |
+----------------------------+------------------------------------------------------------------------------+------+-----+---------+----------------+
| type                       | enum('Website','Facebook','Twitter','Linkedin','Youtube','SeatGeek','Yahoo') | NO   | MUL | NULL    |                |
| name                       | varchar(100)                                                                 | YES  | MUL | NULL    |                |
| processing_interface_id    | bigint(20)                                                                   | YES  | MUL | NULL    |                |
| processing_interface_table | varchar(100)                                                                 | YES  | MUL | NULL    |                |
| create_time                | datetime                                                                     | YES  | MUL | NULL    |                |
| run_time                   | datetime                                                                     | YES  | MUL | NULL    |                |
| completed_time             | datetime                                                                     | YES  | MUL | NULL    |                |
| reserved                   | int(10)                                                                      | YES  | MUL | NULL    |                |
| params                     | text                                                                         | YES  |     | NULL    |                |
| params_md5                 | varchar(100)                                                                 | YES  | MUL | NULL    |                |
| priority                   | int(10)                                                                      | YES  | MUL | NULL    |                |
| id                         | bigint(20) unsigned                                                          | NO   | PRI | NULL    | auto_increment |
| status                     | varchar(40)                                                                  | NO   | MUL | none    |                |
+----------------------------+------------------------------------------------------------------------------+------+-----+---------+----------------+

select *  from remote_request use index ( processing_order )  where remote_request.status = 'none' and type = 'Facebook' and reserved = '0' order by priority desc limit 0, 40;

This table receives an extremely large amount of writes and reads. each remote_request ends up being a process, which can spawn anywhere between 0 and 5 other remote_requests depending on the type of request, and what the request does.

The table is currently sitting at about 3.5 Million records, and it goes to a snail pace when the site itself is under heavy load and I have more then 50 or more instances running simultaneously. (REST requests are the purpose of the table just in case you were not sure).

As the table grows it just gets worse and worse. I can clear the processed requests out on a daily basis but ultimatly this is not fixing the problem.

What I need is for this query to always have a very low response ratio.

Here are the current indexes on the table.

+----------------+------------+----------------------------------+--------------+----------------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table          | Non_unique | Key_name                         | Seq_in_index | Column_name                | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+----------------+------------+----------------------------------+--------------+----------------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| remote_request |          0 | PRIMARY                          |            1 | id                         | A         |     2403351 |     NULL | NULL   |      | BTREE      |         |               |
| remote_request |          1 | type_index                       |            1 | type                       | A         |          18 |     NULL | NULL   |      | BTREE      |         |               |
| remote_request |          1 | processing_interface_id_index    |            1 | processing_interface_id    | A         |          18 |     NULL | NULL   | YES  | BTREE      |         |               |
| remote_request |          1 | processing_interface_table_index |            1 | processing_interface_table | A         |          18 |     NULL | NULL   | YES  | BTREE      |         |               |
| remote_request |          1 | create_time_index                |            1 | create_time                | A         |      160223 |     NULL | NULL   | YES  | BTREE      |         |               |
| remote_request |          1 | run_time_index                   |            1 | run_time                   | A         |      343335 |     NULL | NULL   | YES  | BTREE      |         |               |
| remote_request |          1 | completed_time_index             |            1 | completed_time             | A         |      267039 |     NULL | NULL   | YES  | BTREE      |         |               |
| remote_request |          1 | reserved_index                   |            1 | reserved                   | A         |          18 |     NULL | NULL   | YES  | BTREE      |         |               |
| remote_request |          1 | params_md5_index                 |            1 | params_md5                 | A         |     2403351 |     NULL | NULL   | YES  | BTREE      |         |               |
| remote_request |          1 | priority_index                   |            1 | priority                   | A         |         716 |     NULL | NULL   | YES  | BTREE      |         |               |
| remote_request |          1 | status_index                     |            1 | status                     | A         |          18 |     NULL | NULL   |      | BTREE      |         |               |
| remote_request |          1 | name_index                       |            1 | name                       | A         |          18 |     NULL | NULL   | YES  | BTREE      |         |               |
| remote_request |          1 | processing_order                 |            1 | priority                   | A         |         200 |     NULL | NULL   | YES  | BTREE      |         |               |
| remote_request |          1 | processing_order                 |            2 | status                     | A         |         200 |     NULL | NULL   |      | BTREE      |         |               |
| remote_request |          1 | processing_order                 |            3 | type                       | A         |         200 |     NULL | NULL   |      | BTREE      |         |               |
| remote_request |          1 | processing_order                 |            4 | reserved                   | A         |         200 |     NULL | NULL   | YES  | BTREE      |         |               |
+----------------+------------+----------------------------------+--------------+----------------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+

Any idea how i solve this? Is it not possible to make some sort of complicated index that would automatic order them with priority, then take the first 40 that match the 'Facebook' type? It currently is scanning more then 500k rows of the table before it returns a result which is grossly inefficient.

Some other version of the query that I have been tinkering with are:

select *  from remote_request use index ( type_index,status_index,reserved_index,priority_index )  where remote_request.status = 'none' and type = 'Facebook' and reserv                          ed = '0' order by priority desc limit 0, 40

It would be amazing if we could get the rows scanned to under 1000 rows depending on just how many types of requests enter the table.

Thanks in advance, this might be a real nutcracker for most except the most experienced mysql experts?

Upvotes: 1

Views: 341

Answers (2)

Bill Karwin
Bill Karwin

Reputation: 562931

Your four-column index has the right columns, but in the wrong order.

You want the index to first look up matching rows, which you do by three columns. You are looking up by three equality conditions, so you know that once the index finds the set of matching rows, the order of these rows is basically a tie with respect to those first three columns. So to resolve the tie, add as the fourth column the column by which you wanted to sort.

If you do that, then the ORDER BY becomes a no-op, because the query can just read the rows in the order they are stored in the index.

So I would create the following index:

CREATE INDEX processing_order2 ON remote_request 
 (status, type, reserved, priority);

There's probably not too much significance to the order of the first three columns, since they're all in equality terms combined with AND. But the priority column belongs at the end.

You may also like to read my presentation How to Design Indexes, Really.

By the way, using USE INDEX() shouldn't be necessary if you have the right index, MySQL's optimizer will choose it automatically most of the time. But USE INDEX() can block the optimizer from considering a new index that you create, so it becomes a disadvantage for code maintenance.

Upvotes: 1

Cfreak
Cfreak

Reputation: 19319

This isn't a complete answer but it was too long for a comment:

Are you actually searching on all of those indexes? If not get rid of some. Extra indexes slow down writes.

Secondly use EXPLAIN on your query and don't specify an index when you do. See how MySQL wants to process it rather than forcing an option (Generally it does the right thing).

Finally sorting is likely what hurts you the most. If you don't sort it probably gets the records pretty quickly. It has to scan and sort every row that meets your criteria before it can return the top 40.

Options:

  1. Try creating a VIEW (not as familiar with VIEWS but it might work)
  2. Split this table into smaller tables
  3. use a third party tool such as Sphinx or Lucene to create specialized indexes to search on. (I've used Sphinx for something like this before. You can find it at http://sphinxsearch.com/).
  4. Or look into using a NoSQL solution where you can use a Map function to do it.

Edit I read a bit about using VIEW and I don't think it will help you in your case because you have such a large table. See the answer in this thread: Using MySQL views to increase performance

Upvotes: 1

Related Questions