nitotm
nitotm

Reputation: 579

SQL Improve Efficiency: LIMIT the amount of FILESORT

I’m going to explain myself, with a query like this: (post_id= PRIMARY, blog_id=index )

SELECT post_id FROM posts WHERE blog_id IN (2,3,...) ORDER BY post_id DESC LIMIT 10

Update: the ids in IN() could be numerous. If the DB uses blog_id as key for the query, it has to make a filesort, because the index would look like this:

(blog_id,post_id)-> (1,55) (1,59) (1,69) (2,57) (2,71) (2,72) (3,12)

If instead of IN() you search for just one id blog_id = 2, it does not need to do any filesort because all the matches are already in order.

The problem that I think it’s happening, not 100% sure but just by looking at the queries execution times, is that if I add a LIMIT 10 the efficient way would be to only catch and filesort the last 10 ids of every blog_id index key match, maybe it already does that, but looks like for a IN (2,3,4) ORDER BY post_id DESC LIMIT 10, it filesorts thousands of ids instead of 30.

I hope I’m just dead wrong because if I am not that is a terrible inefficient mistake. If I’m right is there any engine or change I could do? even change database. Currently I’m on 10.1.13-MariaDB and the table is InnoDB

Upvotes: 0

Views: 223

Answers (2)

Rick James
Rick James

Reputation: 142306

Look at EXPLAIN SELECT ...; see if it says "filesort".

Do the following to get details, even for small datasets:

FLUSH STATUS;
SELECT ...;
SHOW SESSION STATUS LIKE 'Handler%';

You do need INDEX(blog_id, post_id). If you are using InnoDB and the table has

PRIMARY KEY(post_id),
INDEX(blog_id)

then you do have that composite index. This is because every secondary index implicitly includes the PK's column(s).

Since you are using MariaDB, see if LIMIT ROWS EXAMINED will do the other thing you asked about.

When the Optimizer sees this:

WHERE blog_id IN (2,3)
ORDER BY post_id DESC LIMIT 10

and it has both INDEX(blog_id) and INDEX(post_id), it makes a decision -- but on limited statistics -- as to which way to go:

Plan A: Filter on blog_id + filesort, or
Plan B: scan in post_id order hoping to find 10 rows soon.

Either one is risky. Plan A, if most or all of the rows are (2,3), will have a big sort. Plan B, when there are fewer than 10 matching rows, will scan the entire table (or index).

Upvotes: 1

Gordon Linoff
Gordon Linoff

Reputation: 1270011

Unfortunately, MySQL does not have an index that lets you do what you want.

However, you can rewrite the query you have and use the existing index:

SELECT p.post_id
FROM ((SELECT post_id
       FROM posts
       WHERE blog_id = 2
       ORDER BY post_id DESC
       LIMIT 10
      ) UNION ALL
      (SELECT post_id
       FROM posts
       WHERE blog_id = 3
       ORDER BY post_id DESC
       LIMIT 10
      )
     ) p
ORDER BY post_id DESC
LIMIT 10;

Each subquery will use the index. And a sort on 20 elements is pretty fast.

Upvotes: 2

Related Questions