sdds
sdds

Reputation: 2051

PostgreSQL choosing a hash join rather than an index scan

I'm running PostgreSQL version 12.6. I have a table delivery_info with 14007206 rows (I removed irrelevant parts of DDL for brevity):

create table if not exists delivery_info
(
    id bigserial not null
        constraint delivery_info_pkey
            primary key,
    user_notification_id bigint
        constraint delivery_info__user_notification_id__fkey
            references notification_user
                on delete cascade,
    ...
    acknowledged boolean default false not null,
    status_change_date timestamp not null,
    ...
);

...

create index if not exists delivery_info__user_notification_id__index
    on delivery_info (user_notification_id);

create index if not exists delivery_info__status_change_date_acknowledged__index
    on delivery_info (status asc, status_change_date desc, acknowledged asc)
    where (status = 1);

And notification_user with 35503013 rows:

create table if not exists notification_user
(
    id bigserial not null
        constraint notification_user__id__seq
            primary key,
    ...
);

The query in question does a join on the two tables with a WHERE clause on delivery_info:

SELECT *
FROM delivery_info AS d
INNER JOIN notification_user AS n ON d.user_notification_id = n.id
WHERE d.status = 1 AND d.acknowledged = false AND d.status_change_date < '2021-04-16T13:48:00.2234239Z';

Normally the engine chooses a hash join and an index scan on delivery_info:

Gather  (cost=1211782.75..1987611.05 rows=2293631 width=122) (actual time=49921.908..123141.788 rows=2790 loops=1)
  Workers Planned: 4
  Workers Launched: 4
  Buffers: shared hit=24996 read=412218
  I/O Timings: read=317223.835
  ->  Parallel Hash Join  (cost=211782.75..758247.95 rows=573408 width=122) (actual time=49923.633..123072.227 rows=558 loops=5)
        Hash Cond: (n.id = d.user_notification_id)
        Buffers: shared hit=24993 read=412218
        I/O Timings: read=317223.835
        ->  Parallel Seq Scan on notification_user n  (cost=0.00..511671.22 rows=8896122 width=75) (actual time=9.874..90448.053 rows=7100603 loops=5)
              Buffers: shared hit=10492 read=412218
              I/O Timings: read=317223.835
        ->  Parallel Hash  (cost=204615.15..204615.15 rows=573408 width=47) (actual time=210.255..210.262 rows=558 loops=5)
              Buckets: 4194304  Batches: 1  Memory Usage: 33056kB
              Buffers: shared hit=14386
              ->  Parallel Bitmap Heap Scan on delivery_info d  (cost=43803.04..204615.15 rows=573408 width=47) (actual time=187.358..188.670 rows=558 loops=5)
                    Recheck Cond: ((status_change_date < '2021-04-16 13:48:00.223424'::timestamp without time zone) AND (status = 1))
                    Filter: (NOT acknowledged)
                    Heap Blocks: exact=87
                    Buffers: shared hit=14386
                    ->  Bitmap Index Scan on delivery_info__status_change_date_acknowledged__index  (cost=0.00..43229.63 rows=2293631 width=0) (actual time=182.445..182.447 rows=2790 loops=1)
                          Index Cond: ((status_change_date < '2021-04-16 13:48:00.223424'::timestamp without time zone) AND (acknowledged = false))
                          Buffers: shared hit=14259
Planning Time: 57.240 ms
Execution Time: 123147.866 ms

The same query analyzed with SET enable_seqscan = off:

Gather  (cost=1043803.60..2525242.24 rows=2293631 width=122) (actual time=156.124..186.178 rows=2790 loops=1)
  Workers Planned: 4
  Workers Launched: 4
  Buffers: shared hit=28349
  ->  Nested Loop  (cost=43803.60..1295879.14 rows=573408 width=122) (actual time=124.191..137.654 rows=558 loops=5)
        Buffers: shared hit=28349
        ->  Parallel Bitmap Heap Scan on delivery_info d  (cost=43803.04..204615.15 rows=573408 width=47) (actual time=124.141..125.410 rows=558 loops=5)
              Recheck Cond: ((status_change_date < '2021-04-16 13:48:00.223424'::timestamp without time zone) AND (status = 1))
              Filter: (NOT acknowledged)
              Heap Blocks: exact=57
              Buffers: shared hit=14386
              ->  Bitmap Index Scan on delivery_info__status_change_date_acknowledged__index  (cost=0.00..43229.63 rows=2293631 width=0) (actual time=155.243..155.245 rows=2790 loops=1)
                    Index Cond: ((status_change_date < '2021-04-16 13:48:00.223424'::timestamp without time zone) AND (acknowledged = false))
                    Buffers: shared hit=14259
        ->  Index Scan using notification_user__id__seq on notification_user n  (cost=0.56..1.90 rows=1 width=75) (actual time=0.007..0.007 rows=1 loops=2790)
              Index Cond: (id = d.user_notification_id)
              Buffers: shared hit=13963
Planning Time: 1.061 ms
Execution Time: 190.706 ms

I've also noticed that if I set a lower bound on delivery_info.status_change_date ((d.status_change_date < '2021-04-16T13:48:00.2234239Z') AND (d.status_change_date > '2021-04-10T13:48:00.2234239Z');), the query planner apparently decides the query on delivery_info becomes selective enough that using the index for notification_user is appropriate:

Nested Loop  (cost=0.99..468852.99 rows=155590 width=122) (actual time=0.074..243.885 rows=2790 loops=1)
  Buffers: shared hit=28412
  ->  Index Scan using delivery_info__status_change_date_acknowledged__index on delivery_info d  (cost=0.43..132601.82 rows=155590 width=47) (actual time=0.037..203.074 rows=2790 loops=1)
        Index Cond: ((status_change_date < '2021-04-16 13:48:00.223424'::timestamp without time zone) AND (status_change_date > '2021-04-10 13:48:00.223424'::timestamp without time zone) AND (acknowledged = false))
        Buffers: shared hit=14452
  ->  Index Scan using notification_user__id__seq on notification_user n  (cost=0.56..2.16 rows=1 width=75) (actual time=0.007..0.007 rows=1 loops=2790)
        Index Cond: (id = d.user_notification_id)
        Buffers: shared hit=13960
Planning Time: 16.755 ms
Execution Time: 248.475 ms

Both queries (with and without the lower bound on delivery_info.status_change_date) return 2790 results. So apparently, the problem is that the query planner assumes the clause on status_change_date to be very unselective, despite there being relatively few rows that satisfy all the clauses in the query. How can I optimize this behavior? I'd prefer not setting a lower bound on status_change_date.

I did VACUUM ANALYZE on delivery_info, I've also checked seq_page_cost and random_page_cost (both set to 1). Tried increasing STATISTICS on status_change_date and increasing default_statistics_target before running ANALYZE, all to no avail.

Edit: As per @jjanes suggestion I've added actual and estimated counts of different combinations of the where-clause expressions:

clause                                                                                              actual      estimated
d.status = 1 AND d.acknowledged = false AND d.status_change_date < '2021-04-16T13:48:00.2234239Z'   2790        2295101
d.status = 1 AND d.acknowledged = false AND d.status_change_date > '2021-04-16T13:48:00.2234239Z'   119         571
d.status = 1 AND d.acknowledged != false AND d.status_change_date < '2021-04-16T13:48:00.2234239Z'  2891204     596341
d.status = 1 AND d.acknowledged != false AND d.status_change_date > '2021-04-16T13:48:00.2234239Z'  0           148
d.status != 1 AND d.acknowledged = false AND d.status_change_date < '2021-04-16T13:48:00.2234239Z'  11113008    8820447
d.status != 1 AND d.acknowledged = false AND d.status_change_date > '2021-04-16T13:48:00.2234239Z'  3           2193
d.status != 1 AND d.acknowledged != false AND d.status_change_date < '2021-04-16T13:48:00.2234239Z' 82          2291834
d.status != 1 AND d.acknowledged != false AND d.status_change_date > '2021-04-16T13:48:00.2234239Z' 0           570

Seems like the estimated counts are wildly of the mark. I've done ANALYZE over an over, what did I miss?

Upvotes: 1

Views: 5500

Answers (4)

jjanes
jjanes

Reputation: 44167

I had meant you should get the counts with the different combinations of clauses omitted, rather than negated. But we can still get the same conclusion with just a bit more math. We see that the columns are strongly dependent on one another. Things with a status of 1 are acknowledged at a far higher rate (99.9%) than things overall are (20%). While the planner generally assumes the columns are independent.

This is what custom statistics were created to deal with. In this case, you would want:

CREATE STATISTICS foobar (MCV) ON acknowledged, status FROM delivery_info;
ANALYZE delivery_info;

When I create data with the distribution you show, that works like magic to fix the estimates. But this depends on there being relatively few different status values. (I used 2 through 11 evenly distributed to be the values for when it is !=1). You said you tried creating statistics, but you didn't say what exactly you tried, nor if you analyzed the table after creating them.

Custom statistics do not have multi-dimensional histograms. In this case, the thing you are relying on is the most-common-value (MCV). But when you include the time stamp, that is essentially a continuous variable (well, I'm assuming...perhaps you have it truncated to just the month or something, but I doubt that) and trying to compute the MCV on a continuous variable is hopeless. So including the timestamp renders the custom statistics useless. This is essentially the same reason it also doesn't work if "status" has a large number of distinct values.

Upvotes: 3

SQLpro
SQLpro

Reputation: 5131

You use a

SELECT *

in your queries and gives us a partial part of the table structure.

No index will be efficient if not all columns that are used in the query, are in the index definition.

So the question is : do you really needs all the columns to be returned ? If yes, your indexes must have all the columns in the table, and for this case you must use the new INCLUDE clause (invented for Microsoft SQL Server), otherwise, rresticts the list of the columns in the SELECT clause of the SELECT statement to the minimal subsets of columns you need.

Incidentally, ALWAYS give the complete DDL code

Upvotes: 0

O. Jones
O. Jones

Reputation: 108651

Adding a column to, and reordering the columns in, your index should help.

Your query's WHERE clause does these filters on your delivery_info table.

WHERE d.status = 1
  AND d.acknowledged = false
  AND d.status_change_date < timeconstant;

It then uses d.user_notification_id as a fk to access your other table.

A good bet for helping this query is to create a covering BTREE index, like this.

create index if not exists delivery_info__status_change_date_acknowledged__index
    on delivery_info (status, 
                      acknowledged, 
                      status_change_date desc, 
                      user_notification_id) 
 where (status = 1);

Why? The query can random-access the index to the first eligible entry, then completely satisfy what it needs from that table by scanning the index. As an extra added bonus, the values you use for the fk will be in ascending order, matching the pk on the table you join. That should allow a merge join in place of a hash join, hopefully.

Your acknowledged column should come before your status_change_date column in the index, because you filter for equality on the former and a range on the latter.

Pro tip: SELECT * can be harmful to performance in these situations, because it forces the query to retrieve columns you might not need. List the columns you need in your SELECT clause.

Upvotes: 1

Frank Heikens
Frank Heikens

Reputation: 127086

First thing the catches my attention, is this index:

create index if not exists delivery_info__status_change_date_acknowledged__index
    on delivery_info (status asc, status_change_date desc, acknowledged asc)
    where (status = 1);

There is no point in adding "status asc" when all values have the same value: "where (status = 1)".

I would merge both indexes and would first try this one:

create index if not exists delivery_info__status_change_date_acknowledged__index
    on delivery_info (status_change_date desc, user_notification_id, acknowledged)
    where (status = 1);

Another thing that might be of help, is creating some additional statistics.

Upvotes: 1

Related Questions