SPIRiT_1984
SPIRiT_1984

Reputation: 2767

Should I avoid ORDER BY in queries for large tables?

In our application, we have a page that displays user a set of data, a part of it actually. It also allows user to order it by a custom field. So in the end it all comes down to query like this:

SELECT name, info, description FROM mytable
WHERE active = 1 -- Some filtering by indexed column
ORDER BY name LIMIT 0,50; -- Just a part of it

And this worked just fine, as long as the size of table is relatively small (used only locally in our department). But now we have to scale this application. And let's assume, the table has about a million of records (we expect that to happen soon). What will happen with ordering? Do I understand correctly, that in order to do this query, MySQL will have to sort a million records each time and give a part of it? This seems like a very resource-heavy operation.

My idea is simply to turn off that feature and don't let users select their custom ordering (maybe just filtering), so that the order would be a natural one (by id in descending order, I believe the indexing can handle that).

Or is there a way to make this query work much faster with ordering?

UPDATE:

Here is what I read from the official MySQL developer page.

In some cases, MySQL cannot use indexes to resolve the ORDER BY, although it still uses indexes to find the rows that match the WHERE clause. These cases include the following:

....

The key used to fetch the rows is not the same as the one used in the ORDER BY:

SELECT * FROM t1 WHERE key2=constant ORDER BY key1;

So yes, it does seem like mysql will have a problem with such a query? So, what do I do - don't use an order part at all?

Upvotes: 3

Views: 2440

Answers (3)

deroby
deroby

Reputation: 6002

The 'problem' here seems to be that you have 2 requirements (in the example)

  • active = 1
  • order by name LIMIT 0, 50

The former you can easily solve by adding an index on the active field The latter you can improve by adding an index on name

Since you do both in the same query, you'll need to combine this into an index that lets you resolve the active value quickly and then from there on fetches the first 50 names.

As such, I'd guess that something like this will help you out:

CREATE INDEX idx_test ON myTable (active, name)

(in theory, as always, try before you buy!)

Keep in mind though that there is no such a thing as a free lunch; you'll need to consider that adding an index also comes with downsides:

  • the index will make your INSERT/UPDATE/DELETE statements (slightly) slower, usually the effect is negligible but only testing will show
  • the index will require extra space in de database, think of it as an additional (hidden) special table sitting next to your actual data. The index will only hold the fields required + the PK of the originating table, which usually is a lot less data then the entire table, but for 'millions of rows' it can add up.
  • if your query selects one or more fields that are not part of the index, then the system will have to fetch the matching PK fields from the index first and then go look for the other fields in the actual table by means of the PK. This probably is still (a lot) faster than when not having the index, but keep this in mind when doing something like SELECT * FROM ... : do you really need all the fields?
  • In the example you use active and name but from the text I get that these might be 'dynamic' in which case you'd have to foresee all kinds of combinations. From a practical point this might not be feasible as each index will come with the downsides of above and each time you add an index you'll add supra to that list again (cumulative).

PS: I use PK for simplicity but in MSSQL it's actually the fields of the clustered index, which USUALLY is the same thing. I'm guessing MySQL works similarly.

Upvotes: 4

Hytool
Hytool

Reputation: 1368

Explain your query, and check, whether it goes for filesort,

If Order By doesnt get any index or if MYSQL optimizer prefers to avoid the existing index(es) for sorting, it goes with filesort.

Now, If you're getting filesort, then you should preferably either avoid ORDER BY or you should create appropriate index(es).

if the data is small enough, it does operations in Memory else it goes on the disk.

so you may try and change the variable < sort_buffer_size > as well.

Upvotes: 2

pjason
pjason

Reputation: 155

there are always tradeoffs, one way to improve the preformance of order query is to set the buffersize and then the run the order by query which improvises the performance of the query

set sort_buffer_size=100000; <>

If this size is further increased then the performance will start decreasing

Upvotes: 0

Related Questions