Hamed Kamrava
Hamed Kamrava

Reputation: 12847

MySQL query becomes extremely slow when using Order By

I have a messages table with 15 million rows.

Below query returns about 5 million(But limited in 15 rows) records in less than 1 second :

SELECT messages.* FROM messages 
INNER JOIN gateways ON
messages.gateway_id=gateways.id
INNER JOIN orders ON
gateways.order_id=orders.id
WHERE orders.user_id=6500 AND messages.deleted=0
AND messages.type='Out' LIMIT 15;

But when I adding an Order ByidDESC to the end of it, it becomes extremely slow about ~40 seconds :

SELECT messages.* FROM messages 
INNER JOIN gateways ON
messages.gateway_id=gateways.id
INNER JOIN orders ON
gateways.order_id=orders.id
WHERE orders.user_id=6500 AND messages.deleted=0
AND messages.type='Out' ORDER BY messages.id DESC LIMIT 15;

Any help would be great appreciated.

Upvotes: 1

Views: 167

Answers (2)

bobflux
bobflux

Reputation: 11581

I assume that

  • each order belongs to one user
  • each gateway belongs to one order

Therefore, this:

INNER JOIN gateways ON messages.gateway_id=gateways.id
INNER JOIN orders   ON gateways.order_id=orders.id
WHERE orders.user_id=6500 AND messages.deleted=0

can be rephrased into English as:

"Get the gateways which belong to the orders which belong to this user".

Now, to get the most recent messages related to this user, the problem is we will likely have many different gateway_id's (about 143 according to your EXPLAIN), so we can't use an index to skip the sort.

Well, we could, as O. Jones demonstrated, but there is a catch. Here is the simplified version of the query:

SELECT ... FROM messages
WHERE gateway_id IN (1,2) ORDER BY id DESC LIMIT 10

If we have an index on (id,gateway_id) then MySQL will most likely decide to scan it in descending order. If it quickly finds 10 messages having "gateway_id IN (1,2)" then it will be fast. However, it could need to scan the entire index if these gateway_ids have very old messages, or no messages at all.

If the PK relationships are as I described, I would materialize a user_id column inside the messages table, which would then allow an index on (user_id,message_id) which would give query timings well under a millisecond.

Upvotes: 1

O. Jones
O. Jones

Reputation: 108651

The pattern SELECT lots_of_stuff ORDER BY something LIMIT small_integer is notorious for causing performance trouble. Leaving out ORDER BY something makes the performance trouble go away. Why? Because the pattern with ORDER BY causes the MySQL server to sort a very large number of fairly large rows (5 million in your case), only to discard all but a small number of them. That uses a lot of RAM, CPU, and IO in your server, just to discard most of the work.

Your best bet is to use a deferred join type of pattern here, where you sort nothing but the message.id values. Use this subquery to do that.

                   SELECT messages.id 
                     FROM messages 
               INNER JOIN gateways ON messages.gateway_id=gateways.id
               INNER JOIN orders   ON gateways.order_id=orders.id
                    WHERE orders.user_id=6500
                      AND messages.deleted=0
                      AND messages.type='Out'
                 ORDER BY messages.id DESC
                   LIMIT 15

This will give you a nice little collection of 15 message.id values.

Your next step is to work on optimizing this subquery. I suggest you try a compound covering index on your messages table, containing the columns (deleted, type, id, gateway_id). This shoud help accelerate it.

You may need indexes on the other tables as well. You should consider using the EXPLAIN function in MySQL to analyze your performance.

Finally, use that little collection of messages.id values to get the messages rows you need, like this. (This is the deferred join; you're deferring fetching the whole row until you know which rows you'll need. That way you don't have to ORDER the whole mess.)

Edit Add a compound index on gateways (order_id, id) to avoid the full table scan of that table. It's not enormous, but this might help a bit.

SELECT a.*
  FROM messages a
  JOIN (
                   SELECT messages.id 
                     FROM messages 
               INNER JOIN gateways ON messages.gateway_id=gateways.id
               INNER JOIN orders   ON gateways.order_id=orders.id
                    WHERE orders.user_id=6500
                      AND messages.deleted=0
                      AND messages.type='Out'
                 ORDER BY messages.id DESC
                   LIMIT 15
       ) b ON a.id = b.id
 ORDER BY a.id DESC

Upvotes: 1

Related Questions