Reputation: 2608
I have a fairly complex query which includes an ORDER BY
and a LIMIT
clause. When the ORDER BY
uses the primary key, the query takes less than 5 milliseconds. However, if I change the query such that the ORDER BY
is done by a different column (of type FLOAT
), the response time balloons to more than 50 seconds (four orders of magnitude higher!).
Now, I presume the problem is that the query ordered by primary key performs an index scan, whereas the query ordered by the float column does a sequential scan and requires sorting at the end.
I thought that simply adding an index on the float column would suffice for Postgresql to plan this query in a smarter way. Apparently I was wrong. What may I have missed?
EDIT: I did run EXPLAIN ANALYZE
before posting the question. Hence my presumption is not just a wild guess; however, since the output of EXPLAIN ANALYZE
runs for more than 30 lines, it's not immediately clear why one query uses the index whereas the other has to sort all the rows.
Upvotes: 6
Views: 6833
Reputation: 238296
For a query that returns many rows, it's unusual for a database to use a non-covering index. The cost of the table lookup (from the index to the table data) is too high. A table scan will be used instead.
For example,
select name from people where name > 'N' order by birthdate
Would the database use an index on (birthday)
? On the plus side, the rows would be returned in the right order. On the down side, every row would need a table lookup for the name
column. The second is much more expensive and so the index would not be used.
An index on (birthday, name)
is different. It includes the name, so no table lookup is required. The database can use the index to quickly return rows in the right order.
An index that includes all columns required for a query is called a covering index. Make sure your index includes all columns used by your query, then try again.
Upvotes: 0
Reputation: 1271231
It is very hard to decipher what is happening without seeing the query. My guess is that the query plan is able to do the joins based on the table with the primary key, keeping the data in the proper order. The query plan is then basically fetch a row, look up values in other tables, massage them, and return the values in order. The processing goes as far as the limit
does.
When you replace this by another column in the order by
, all the rows have to be processed. These are sorted and returned. It might be the size of the underlying tables or it might be the size of the result set resulting in longer processing. But, the fundamental reason is that all rows need to be generated.
Upvotes: 2
Reputation:
Upvotes: 4