Why PostgreSql does not use PK index?

Question

If I want to select 0.5% rows, or even 5% rows from the following table via a PK, the query planner correctly chooses to use the PK index. Here is the table:

create table weather as
with numbers as(
select generate_series as id from generate_series(0,1048575))
select id, 
50 + 50*sin(id) as temperature_in_f, 
50 + 50*sin(id) as humidity_in_percent
from numbers;

alter table weather
add constraint pk_weather primary key(id);

vacuum analyze weather;

The stats are up-to-date, and the following query does use the PK index:

explain analyze select sum(w.id), sum(humidity_in_percent), count(*) 
from weather as w
where w.id between 1 and 66720;

Suppose, however, that we need to join this table with another, much smaller, one:

create table lightnings 
as
select id as weather_id
from weather
where humidity_in_percent between 99.99 and 100;

alter table lightnings
add constraint pk_lightnings
primary key(weather_id);

analyze lightnings;

Here is my join, in four logically equivalent forms:

explain analyze select sum(w.id), count(*) from weather as w
where w.humidity_in_percent between 99.99 and 100
and exists(select * from lightnings as l
  where l.weather_id=w.id);

explain analyze select sum(w.id), count(*) 
from weather as w
join lightnings as l
  on l.weather_id=w.id
where w.humidity_in_percent between 99.99 and 100;

explain analyze select sum(w.id), count(*) 
from lightnings as l
join weather as w
  on l.weather_id=w.id
where w.humidity_in_percent between 99.99 and 100;

-- replaced explicit join with where clause
explain analyze select sum(w.id), count(*) 
from lightnings as l, weather as w
where w.humidity_in_percent between 99.99 and 100
and l.weather_id=w.id;

Unfortunately the query planner resorts to scanning the whole weather table:

"Aggregate  (cost=22645.68..22645.69 rows=1 width=4) (actual time=167.427..167.427 rows=1 loops=1)"
"  ->  Hash Join  (cost=180.12..22645.52 rows=32 width=4) (actual time=2.500..166.444 rows=6672 loops=1)"
"        Hash Cond: (w.id = l.weather_id)"
"        ->  Seq Scan on weather w  (cost=0.00..22407.64 rows=5106 width=4) (actual time=0.013..158.593 rows=6672 loops=1)"
"              Filter: ((humidity_in_percent >= 99.99::double precision) AND (humidity_in_percent <= 100::double precision))"
"              Rows Removed by Filter: 1041904"
"        ->  Hash  (cost=96.72..96.72 rows=6672 width=4) (actual time=2.479..2.479 rows=6672 loops=1)"
"              Buckets: 1024  Batches: 1  Memory Usage: 235kB"
"              ->  Seq Scan on lightnings l  (cost=0.00..96.72 rows=6672 width=4) (actual time=0.009..0.908 rows=6672 loops=1)"
"Planning time: 0.326 ms"
"Execution time: 167.581 ms"

The query planner's estimate on how many rows in weather table will be selected is rows=5106. This is more or less close to the exact value of 6672. If I select this small number of rows in weather table via id, the PK index is used. If I select the same amount via a join with another table, the query planner goes for scanning the table.

What am I missing?

select version()
"PostgreSQL 9.4.0"

Edit: if I remove the condition on humidity, the query planner correctly recognizes that the condition on weather.id is quite selective, and chooses to use the index on PK:

explain analyze select sum(w.id), count(*) from weather as w
where exists(select * from lightnings as l
  where l.weather_id=w.id);
"Aggregate  (cost=14677.84..14677.85 rows=1 width=4) (actual time=37.200..37.200 rows=1 loops=1)"
"  ->  Nested Loop  (cost=0.42..14644.48 rows=6672 width=4) (actual time=0.022..36.189 rows=6672 loops=1)"
"        ->  Seq Scan on lightnings l  (cost=0.00..96.72 rows=6672 width=4) (actual time=0.011..0.868 rows=6672 loops=1)"
"        ->  Index Only Scan using pk_weather on weather w  (cost=0.42..2.17 rows=1 width=4) (actual time=0.005..0.005 rows=1 loops=6672)"
"              Index Cond: (id = l.weather_id)"
"              Heap Fetches: 0"
"Planning time: 0.321 ms"
"Execution time: 37.254 ms"

Yet adding a condition totally confuses the query planner.

David Aldridge · Accepted Answer

Expecting the optimiser to use an index on the PK of the larger table implies that you expect the query to be driven from the smaller table. Of course, you know that the rows that the smaller table will join to in the larger one are the same as those selected by the predicate on it, but the optimiser does not.

Look at the line on the plan:

Hash Join  (cost=180.12..22645.52 rows=32 width=4) (actual time=2.500..166.444 rows=6672 loops=1)"

It expects 32 rows to result from the join, but 6672 actually result.

Anyway, it pretty much has the option of:

A full scan on the smaller table, and an index lookup on the larger, with the predicate being used to filter out rows subsequent to the join (and expecting most of the rows to then be filtered out).
A full scan on both tables, with rows being removed by the predicate on the larger table, and a hash join of the result.
A scan of the larger table with rows being removed by the predicate, and an index lookup on the smaller table that may fail to find a value.

The second of these has been judged to be the lowest cost, and I think it is correct to do so based on the evidence it has, as hash joins are very efficient for joining many rows.

Of course it would probably be more efficient to place an index on weather(humidity_in_percent,id) in this particular case, but I suspect that this is a modified version of your real situation (the sum of the id column?) so specific advice may not be applicable.

Why PostgreSql does not use PK index?

Answers (2)

Related Questions