Backend Viking
Backend Viking

Reputation: 63

Chosing the right index for PostgreSQL query

Simplified table:

CREATE TABLE products (
product_no integer PRIMARY KEY,
sales integer,
status varchar(16),
category varchar(16));

CREATE INDEX index_products_sales ON products (sales);
CREATE INDEX index_products_status ON products (status);
CREATE INDEX index_products_category ON products (category);

PostgreSQL version is 8.4. Columns 'status' and 'category'

There are 20 million products/rows spread across 15 categories.

One of the most used queries is getting the three most sold products, excluding products in categories 'cat3' and 'cat7':

SELECT product_no, sales 
FROM products 
WHERE status = 'something' AND category NOT IN ('cat3', 'cat7') 
ORDER BY sales DESC 
LIMIT 3;

Limit  (cost=0.00..8833.39 rows=3 width=12) (actual time=9235.332..9356.284 rows=3 loops=1)
   ->  Index Scan using index_products_sales on products  (cost=0.00..68935806.85 rows=23412 width=12) (actual time=9235.327..9356.278 rows=3 loops=1)
     Filter: (((category)::text <> ALL ('{cat3,cat7}'::text[])) AND ((status)::text = 'something'::text))

What would be the best index for making this specific query run faster?

Upvotes: 3

Views: 6140

Answers (2)

Erwin Brandstetter
Erwin Brandstetter

Reputation: 656431

Create a partial, multicolumn index with this particular sort order:

CREATE INDEX products_status_sales_partial_idx ON products (status, sales DESC)
WHERE  category NOT IN ('cat3','cat7');

Modify your query slightly:

SELECT product_no, sales 
FROM   products 
WHERE  status = 'something'
AND    category NOT IN ('cat3', 'cat7') 
ORDER  BY status, sales DESC 
LIMIT  3;

Adding status as first element of the ORDER BY clause seems redundant and pointless. But give it a try.

Why?

The query planner is not smart enough to understand, that with

WHERE  status = 'something' ...
ORDER  BY sales DESC

the sort order of the index (status, sales DESC) matches as a logical consequence. So it is going to read all qualifying rows, sort and pick the top 3.

By adding status to the ORDER BY you enable the query planner to read the top 3 entries from the index directly. Expect a speed-up by several orders of magnitude.

Tested with PostgreSQL 8.4 and 9.1.

Upvotes: 13

I think a b-tree index is still your best bet. I could be wrong, though. I think I would test two things.

First, a partial index on category that excludes 'cat3' and 'cat7'.

CREATE INDEX index_products_category ON products (category)
  WHERE category NOT IN ('cat3','cat7');

Second, a descending sort on sales.

CREATE INDEX index_products_sales ON products (sales DESC);

Either one of these might slow down other queries, though, so you might need one or both of these in addition to the existing indexes.

Upvotes: 2

Related Questions