Jon Smark
Jon Smark

Reputation: 2608

Index to improve sorting performance?

I have a fairly complex query which includes an ORDER BY and a LIMIT clause. When the ORDER BY uses the primary key, the query takes less than 5 milliseconds. However, if I change the query such that the ORDER BY is done by a different column (of type FLOAT), the response time balloons to more than 50 seconds (four orders of magnitude higher!).

Now, I presume the problem is that the query ordered by primary key performs an index scan, whereas the query ordered by the float column does a sequential scan and requires sorting at the end.

I thought that simply adding an index on the float column would suffice for Postgresql to plan this query in a smarter way. Apparently I was wrong. What may I have missed?

EDIT: I did run EXPLAIN ANALYZE before posting the question. Hence my presumption is not just a wild guess; however, since the output of EXPLAIN ANALYZE runs for more than 30 lines, it's not immediately clear why one query uses the index whereas the other has to sort all the rows.

Upvotes: 6

Views: 6833

Answers (3)

Andomar
Andomar

Reputation: 238296

For a query that returns many rows, it's unusual for a database to use a non-covering index. The cost of the table lookup (from the index to the table data) is too high. A table scan will be used instead.

For example,

select name from people where name > 'N' order by birthdate

Would the database use an index on (birthday)? On the plus side, the rows would be returned in the right order. On the down side, every row would need a table lookup for the name column. The second is much more expensive and so the index would not be used.

An index on (birthday, name) is different. It includes the name, so no table lookup is required. The database can use the index to quickly return rows in the right order.

An index that includes all columns required for a query is called a covering index. Make sure your index includes all columns used by your query, then try again.

Upvotes: 0

Gordon Linoff
Gordon Linoff

Reputation: 1271231

It is very hard to decipher what is happening without seeing the query. My guess is that the query plan is able to do the joins based on the table with the primary key, keeping the data in the proper order. The query plan is then basically fetch a row, look up values in other tables, massage them, and return the values in order. The processing goes as far as the limit does.

When you replace this by another column in the order by, all the rows have to be processed. These are sorted and returned. It might be the size of the underlying tables or it might be the size of the result set resulting in longer processing. But, the fundamental reason is that all rows need to be generated.

Upvotes: 2

user80168
user80168

Reputation:

  1. Run explain analyze on the query - so you will not have to guess what happens.
  2. To optimize query you generally have to read explain analyze output, the query, and then figure out the best course of action. Sometimes - it's adding index, sometimes - rewriting the query. but it's not possible to tell which is best for your case, as we don't see explain nor query.

Upvotes: 4

Related Questions