xtpu
xtpu

Reputation: 55

PostgreSQL: Why is this query not using my index?

I'm performing this query on a database (all the numbers and column names are made up):

select * from t where a=1 and b=11 and c!=5 and d<8

t has an index:

create index i on t (a,b,c,d)

When I run "EXPLAIN ANALYZE", the query performs a sequential scan and takes approximately 55ms to do this. If I modify the query like this:

select * from t where a=1 and b=11 and c=5 and d<8
                                       ^

It uses the index and completes in 0.5 ms. So it must be the NOT EQUALS, right? Not so, because if I do this query:

select * from t where a=1 and b=11 and c=5 and d!=8
                                               ^

The query still uses the index. But if I try this, no index:

select * from t where a=1 and b=11 and c<5 and d<8
                                       ^

So why is Postgres behaving the way it is? This is very strange to me.

Upvotes: 2

Views: 1151

Answers (2)

Markus Winand
Markus Winand

Reputation: 8706

As you already realized, the problem is related to using operators other than equals. An index can only be used most efficiently for the leftmost columns that are compared with by equals (plus one range condition).

In your example:

create index i on t (a,b,c,d);
where a=1 and b=11 and c!=5 and d<8;

It can use the index only for a and b efficiently. That means the DB fetches all rows matching the a and b condition and then checks each row against the remaining conditions.

When you change the filter on c to equals, it fetches (potentially) less rows (only those matching a and b and c) and then checks those (fewer) rows against the d filter. Using the index is more efficient in this case.

In general, the PostgreSQL query planner evaluates both options: (1) using the index; (2) doing a SeqScan. For both, it calculates a cost value — the higher it is the worse is the expected performance. Consequently, it takes the one with the smaller cost value. This is how it decides to use the index or not, there is no fixed threshold.

Finally, is wrote "plus one range condition" above. That means that it can not only use the index in the most efficient way if you are using equals signs, but also for one single range condition.

Considering that you have one single range condition in your query, I'd suggest to change the index like this:

create index i on t (a,b,d,c);

Now it can use the the filters on a and b and d efficiently with the index and only needs to filter the rows away where c!=5. Although this index can be used more efficiently for your query as your original one, it doesn't automatically mean PG will use it. It depends on the cost estimates. But give it a try.

Finally, if this isn't fast enought and the value 5 you are using in the expression c!=5 is constant, you might consider a partial index:

 create index i on t (a,b,d)
        where c!=5;

You could do that with all other columns too, if the values you compare them against are constants.

References:

Upvotes: 2

dominik
dominik

Reputation: 5935

I'd say that it does not use the index for the first query because the index does not really help because nearly the whole table matches. A scan of the whole table is faster in this case. The differences between the last two queries is that the index is probably only used if the expected result size is below a certain threshold. A query with exact match will most likely yield fewer results than one using less than which still yields fewer than not equals selections.

Having said this, query optimizers are a highly complex piece of software and often create surprising results.

Upvotes: 0

Related Questions