samol
samol

Reputation: 20570

Improve performance on multicolumn index and sort

SELECT * FROM table1
WHERE (col1, col2) IN (($1, $2), ($3, $4))
ORDER  BY col3
LIMIT  10;

Output of EXPLAIN ANALYZE:

 Limit  (cost=59174.75..59174.77 rows=10 width=113) (actual time=3632.627..3632.661 rows=10 loops=1)
   ->  Sort  (cost=59174.75..59180.22 rows=2188 width=113) (actual time=3632.623..3632.634 rows=10 loops=1)
         Sort Key: col3
         Sort Method: top-N heapsort  Memory: 27kB
         ->  Nested Loop  (cost=2.62..59127.46 rows=2188 width=113) (actual time=0.234..3561.309 rows=38347 loops=1)
   ...........
   Total runtime: 3632.818 ms

But when I remove the order:

SELECT * FROM table1 WHERE (col1, col2) IN (($1, $2), ($3, $4)) LIMIT 10;
 Limit  (cost=2.62..272.85 rows=10 width=105) (actual time=0.258..1.143 rows=10 loops=1)
   ->  Nested Loop  (cost=2.62..59127.46 rows=2188 width=105) (actual time=0.255..1.115 rows=10 loops=1)
........
Total runtime: 1.306 ms
  1. There is a composite btree index on (col1, col2) and a btree index on col3.
  2. Write performance and storage is not a priority. The read performance is paramount and needs to be as fast as possible.
  3. This must be able to support querying with IN clause: WHERE (col1, col2) IN (($1, $2), ($3, $4)) ORDER BY col3 LIMIT 10;. (Look ups are always with an IN clause and then order.)

Note: Is it possible to create an index on (col1, col2, col3)? That would look up using (col1, col2) and have col3 already ordered ...

Upvotes: 1

Views: 1383

Answers (1)

Erwin Brandstetter
Erwin Brandstetter

Reputation: 656391

Yes. You got your answer in the question already.
For the given query, a multicolumn index on (col1, col2, col3) should be perfect. Just try it.

More about order of columns in a multicolumn B-tree index under this related question on dba.SE:
Is a composite index also good for queries on the first field?

In addition, if you don't actually need all columns from table1, only put the required ones in the SELECT list instead of * to gain performance.

IN

As for your added requirement:

WHERE (col1, col2) IN (($1, $2), ($3, $4))

is equivalent to:

WHERE (col1 = $1 AND col2 = $2 OR
       col1 = $3 AND col2 = $4)

This reduces the effectiveness of the index over (col1, col2, col3), since Postgres cannot just fetch the pre-sorted list from the index. It depends. The fewer items in your IN list and the more rows with the same col3 per (col1, col2) there are, the more you can gain from said index.

You'll have to test. Create the index additionally, be sure your server is configured reasonably, statistics are up to date (ANALYZE) and your cost settings are reasonable and then EXPLAIN will show what Postgres picks. Be sure to run a set of queries that represents your use cases. In the end, remove indexes that don't get used.

Trick Postgres into using special index effectively

The sort step seems to be the expensive part. Try this alternative query: one UNION ALL leg per item in your IN list. This makes an offer to Postgres that it can't refuse: the special index fits the legs of this query perfectly. And the final sort step is cheap for only a few IN items.

(
SELECT *
FROM   table1
WHERE  col1 = $1 AND col2 = $3
ORDER  BY col3
LIMIT  10
)
UNION  ALL
(
SELECT *
FROM   table1
WHERE  col1 = $3 AND col2 = $4
ORDER  BY col3
LIMIT  10
)
... UNION  ALL ...
ORDER  BY col3
LIMIT  10

Note that all parentheses are required to allow ORDER BY and LIMIT for each leg in addition to the final ORDER BY and LIMIT.

Upvotes: 2

Related Questions