Nulik
Nulik

Reputation: 7350

Postgres is using full scan instead of index?

I have 2 tables, one table is 150M records, the other is 1 million. The subset of rows I am asking is only about 400 rows. Despite that I am getting "full scan" in explain analyze. The query is :

with txs as (
  select 
       tx_id 
  from pol_tok_id_ops 
  where condition_id='e3b423dfad8c22ff75c9899c4e8176f628cf4ad4caa00481764d320e7415f7a9'
) 
select 
       e20t.time_stamp,e20t.amount 
from erc20_transf e20t 
cross join txs 
where txs.tx_id=e20t.tx_id;

Explain reports;

 Merge Join  (cost=38689523.80..103962768.64 rows=4325899145 width=40)
   Merge Cond: (txs.tx_id = e20t.tx_id)
   CTE txs
     ->  Bitmap Heap Scan on pol_tok_id_ops  (cost=316.12..15969.27 rows=5622 width=8)
           Recheck Cond: (condition_id = 'e3b423dfad8c22ff75c9899c4e8176f628cf4ad4caa00481764d320e7415f7a9'::text)
           ->  Bitmap Index Scan on pol_tokidops_cond_idx  (cost=0.00..314.72 rows=5622 width=0)
                 Index Cond: (condition_id = 'e3b423dfad8c22ff75c9899c4e8176f628cf4ad4caa00481764d320e7415f7a9'::text)
   ->  Sort  (cost=462.60..476.66 rows=5622 width=8)
         Sort Key: txs.tx_id
         ->  CTE Scan on txs  (cost=0.00..112.44 rows=5622 width=8)
   ->  Materialize  (cost=38673091.92..39442551.04 rows=153891823 width=48)
         ->  Sort  (cost=38673091.92..39057821.48 rows=153891823 width=48)
               Sort Key: e20t.tx_id
               ->  Seq Scan on erc20_transf e20t  (cost=0.00..3543917.23 rows=153891823 width=48)

Tables are defined as:

\d erc20_transf
;                                        Table "public.erc20_transf"
    Column    |           Type           | Collation | Nullable |                 Default                  
--------------+--------------------------+-----------+----------+------------------------------------------
 id           | bigint                   |           | not null | nextval('erc20_transf_id_seq'::regclass)
 evtlog_id    | bigint                   |           | not null | 
 block_num    | bigint                   |           | not null | 
 time_stamp   | timestamp with time zone |           |          | 
 tx_id        | bigint                   |           | not null | 
 contract_aid | bigint                   |           | not null | 
 from_aid     | bigint                   |           |          | 0
 to_aid       | bigint                   |           |          | 0
 amount       | numeric                  |           |          | 0.0
Indexes:
    "erc20_transf_pkey" PRIMARY KEY, btree (id)
    "erc20_transf_evtlog_id_key" UNIQUE CONSTRAINT, btree (evtlog_id)
    "erc20_tr_ctrct_idx" btree (contract_aid)
    "erc20_transf_from_idx" btree (from_aid)
    "erc20_transf_to_idx" btree (to_aid)
    "erc20_tx_idx" btree (tx_id)
Foreign-key constraints:
    "erc20_transf_evtlog_id_fkey" FOREIGN KEY (evtlog_id) REFERENCES evt_log(id) ON DELETE CASCADE
Referenced by:
    TABLE "erc20_bal" CONSTRAINT "erc20_bal_parent_id_fkey" FOREIGN KEY (parent_id) REFERENCES erc20_transf(id) ON DELETE CASCADE
Triggers:
    erc20_transf_delete AFTER DELETE ON erc20_transf FOR EACH ROW EXECUTE PROCEDURE on_erc20_transf_delete()
    erc20_transf_insert AFTER INSERT ON erc20_transf FOR EACH ROW EXECUTE PROCEDURE on_erc20_transf_insert()


 \d pol_tok_id_ops
                                 Table "public.pol_tok_id_ops"
      Column      |  Type   | Collation | Nullable |                  Default                   
------------------+---------+-----------+----------+--------------------------------------------
 id               | bigint  |           | not null | nextval('pol_tok_id_ops_id_seq'::regclass)
 evtlog_id        | bigint  |           |          | 
 tx_id            | bigint  |           |          | 
 parent_split_id  | bigint  |           |          | 
 parent_merge_id  | bigint  |           |          | 
 parent_redeem_id | bigint  |           |          | 
 contract_aid     | bigint  |           |          | 
 outcome_idx      | integer |           | not null | 
 condition_id     | text    |           | not null | 
 token_id_hex     | text    |           | not null | 
 token_from       | text    |           | not null | 
 token_to         | text    |           | not null | 
 token_amount     | numeric |           | not null | 
Indexes:
    "pol_tok_id_ops_pkey" PRIMARY KEY, btree (id)
    "pol_tokidops_cond_idx" btree (condition_id)
    "pol_tokidops_tx_idx" btree (tx_id)
Foreign-key constraints:
    "pol_tok_id_ops_parent_merge_id_fkey" FOREIGN KEY (parent_merge_id) REFERENCES pol_pos_merge(id) ON DELETE CASCADE
    "pol_tok_id_ops_parent_redeem_id_fkey" FOREIGN KEY (parent_redeem_id) REFERENCES pol_pos_merge(id) ON DELETE CASCADE
    "pol_tok_id_ops_parent_split_id_fkey" FOREIGN KEY (parent_split_id) REFERENCES pol_pos_split(id) ON DELETE CASCADE

So, why it is doing full scan on 153 million records when CTE scan reports only 5622 rows??? Or, maybe there is a way to force scan on an index somehow?

I am using Postgres 10.12 (can't upgrade, its a production machine)

EDIT:

Some benchmarks:

psql=> explain analyze select * from erc20_transf WHERE tx_id  in (1111111,2222222,333333,555555,666666);
                                                          QUERY PLAN                                                           
-------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on erc20_transf  (cost=71721.16..2161950.95 rows=3852686 width=96) (actual time=1.420..1.482 rows=3 loops=1)
   Recheck Cond: (tx_id = ANY ('{1111111,2222222,333333,555555,666666}'::bigint[]))
   Heap Blocks: exact=3
   ->  Bitmap Index Scan on erc20_tx_idx  (cost=0.00..70757.99 rows=3852686 width=0) (actual time=1.386..1.386 rows=3 loops=1)
         Index Cond: (tx_id = ANY ('{1111111,2222222,333333,555555,666666}'::bigint[]))
 Planning time: 1.490 ms
 Execution time: 1.557 ms
(7 rows)

psql=> explain analyze select tx_id from pol_tok_id_ops where condition_id='e3b423dfad8c22ff75c9899c4e8176f628cf4ad4caa00481764d320e7415f7a9';
                                                             QUERY PLAN                                                              
-------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on pol_tok_id_ops  (cost=431.34..21552.04 rows=7586 width=8) (actual time=1.431..5.851 rows=340 loops=1)
   Recheck Cond: (condition_id = 'e3b423dfad8c22ff75c9899c4e8176f628cf4ad4caa00481764d320e7415f7a9'::text)
   Heap Blocks: exact=133
   ->  Bitmap Index Scan on pol_tokidops_cond_idx  (cost=0.00..429.45 rows=7586 width=0) (actual time=1.355..1.355 rows=340 loops=1)
         Index Cond: (condition_id = 'e3b423dfad8c22ff75c9899c4e8176f628cf4ad4caa00481764d320e7415f7a9'::text)
 Planning time: 1.585 ms
 Execution time: 5.926 ms
(7 rows)

psql=> 

Upvotes: 1

Views: 2677

Answers (2)

Belayer
Belayer

Reputation: 14861

Perhaps. It would seem to be the JOIN. You have

    ... 
    select 
           e20t.time_stamp,e20t.amount 
    from erc20_transf e20t 
    cross join txs 
    where txs.tx_id=e20t.tx_id;

JOINS typically are performed before WHERE and cross join will combine every row erc20_transf with eveny row txs, then apply the WHERE. Try an INNER JOIN:

    ... 
    select 
           e20t.time_stamp,e20t.amount 
    from erc20_transf e20t 
    join txs on (txs.tx_id=e20t.tx_id);

Upvotes: 0

melcher
melcher

Reputation: 1601

Run vacuum analyze [table-name] (or just analyze [table-name]) to update the query planner statistics.

For large or frequently updated tables, PostgreSQL recommends tuning the auto-analyze (and generally auto-vacuum) settings to fit your needs. If postgres is using out of date table statistics (that don't reflect the current content/distribution of the table) it can come up with bad to terrible query plans.

Also, for unevenly distributed data on a large table (where the sampling techniques postgres uses won't always accurately reflect the underlying data, depending on the sample) you may need to tweak the target statistics. See Analyzing Extreme Distributions in Postgres and Analyze Strategy for big tables in Postgres

Upvotes: 1

Related Questions