Chris
Chris

Reputation: 14218

mysql execution plan less cost with order by?

Why does mysql workbench tell me that:

SELECT * FROM file_results WHERE filter2_dec > 20 LIMIT 3

has a higher cost than:

SELECT * FROM file_results WHERE filter2_dec > 20 ORDER BY filter2_dec DESC LIMIT 3

enter image description here

enter image description here

Upvotes: 0

Views: 865

Answers (4)

Solarflare
Solarflare

Reputation: 11106

This has to do with the way MySQL (currently) optimizes limit. It basically calculates the query as if there is no limit, and then makes some adjustments to the query plan to acknowledge that it might have an effect.

This will have the side-effect that you cannot really trust the cost value anymore - if you could trust it in the first place: its absolute value has not much meaning, it is just important that it lower than all other execution plans.

The theory

First of all, please not that the cost value does not regard the actual number of rows you get. You can e.g. increase the limit value, and it will not change the absolute cost (until it switches to a different execution plan).

Using the index to retrieve 3 rows is obviously a good choice. That is why both queries use that way. But using the (non-covering) index requires a lookup to the table to retrieve the values of the other columns (select *). It will read the index, then read one row from the table, then read the next entry from the index, then read the next row from the table. At some point, it obviously becomes faster to just read the whole table ("full table scan") and throw away the rows you do not need instead of using the index.

If you have to order the data, that points comes later than if you do not have to order the data afterwards.

If you do not use a limit, MySQL will decide if that point is reached by looking at filter2_dec > 20 and guessing how many rows it will get. Try to execute your query without a limit. It will use a full table scan (otherwise use a lower number than 20).

Now increase the value 20, still without limit, to a value high enough so you only get some rows (but more than 0); let's assume that value is 1000.

Now the tricky part: if you add a limit, that point (the value for which it is better to use a range scan instead of a table scan) will obviously change.

MySQL will try to include the effect of that limit by doing some modifications in the list of all the possible execution plans that were generated as if there was no limit. E.g. throwing away some of those or add modified copies based on them. This can result in strange values for the cost (as they are not actually the cost for that operation, but of the original plan).

The practice

Let's see this in action. As it will depends on the exact optimizer and thus mysql version, so your actual behaviour may vary though. But for this simple query, it should still hold in general. I will assume that SELECT * FROM file_results WHERE filter2_dec > 20 will use a full table scan, and SELECT * FROM file_results WHERE filter2_dec > 1000 will use the index on it own (and will actually return some rows). If not, adjust the value or add more rows (and maybe run an optimize table).

First of all, try the following two queries:

SELECT * FROM file_results WHERE filter2_dec > 20 
order by filter2_dec 

SELECT * FROM file_results WHERE filter2_dec > 20 
order by filter2_dec limit 10

should show the same cost for the first step, but it will be a "full table scan" for the first one, and a "index range scan" for the other.

The limit basically took the original plan and replaced the "full table scan" with an "index range scan", but it was not able to calculate the new costs. MySQL just "knew" that because of the limit, a full table scan would be ridiculous to use, and changed it to a range scan. The "order by" itself does not cost anything anymore.

Now check the following 3 queries:

SELECT * FROM file_results WHERE filter2_dec > 20 

SELECT * FROM file_results force index (file_results_filter2_dec_index)
WHERE filter2_dec > 20 

SELECT * FROM file_results WHERE filter2_dec > 20 limit 10

The 2nd and the 3rd one should have the same cost, while the first one should be cheaper - but got thrown out by the limit-operation, so the 2nd (and 3rd) are now the cheapest way to execute the query (you can take this literally: the limit forces you to use the index, as everything else would be just stupid). And finally, this is the reason your query looks more expensive than the other one. It is based on a different plan.

A last check:

SELECT * FROM file_results WHERE filter2_dec > 1000 
order by filter2_dec

SELECT * FROM file_results WHERE filter2_dec > 1000 
order by filter2_dec limit 10

SELECT * FROM file_results WHERE filter2_dec > 1000 limit 10

should all show the same execution plan and identical costs (with or without the limit). That plan will be chosen on its own even without the limit. The adjustments done by the limit-optimizer does not need these plans, only plans that were more expensive to begin with.

Upvotes: 1

Ryan Gadsdon
Ryan Gadsdon

Reputation: 2378

Its most likely because in the order by query you have specified for it to be DESC. This is easier to find because the data is already ordered. Whereas in your other statement, it still has to run through every entry and then limit to the top 3.

The order by is less because you limit after that clause. The data is already ordered so its easier to display a result. Whereas the more costly statement has to do that itself and then limit. This will also be effected by indexes, DBMS and will likely perform differently in an actual live environment.

Upvotes: 1

Gordon Linoff
Gordon Linoff

Reputation: 1269953

I would guess the following.

In the first query, filter2_dec > 20 is not very selective. So, MySQL says: "I'm going to be selecting lots of rows, so I'll just do a full table scan". This is probably enhanced by the LIMIT because if the values are randomly stored in rows, it will probably hit 3 pretty quickly. The choice of using the index is rejected based on the selectivity of the WHERE clause.

In the second query, the ORDER BY keeps the option of using the index as a possibility. If the engine does a full table scan, then it then needs to sort all the data. Instead, using the index makes more sense -- and there ends up being an overall savings.

In other words, the selectivity of the WHERE clause rejects the use of the index before the cost of the index is considered. In the second, the option remains open because of the ORDER BY.

This is speculation, but it would explain what you are seeing. Optimizers are hard.

Upvotes: 0

user8556290
user8556290

Reputation:

May be cos filter2_dec got on is structure column a index. But if you need some good analyst i think you have to past the structure about that table the engine using ... Regards

Upvotes: 0

Related Questions