Reputation: 7350
I have 2 tables, one table is 150M records, the other is 1 million. The subset of rows I am asking is only about 400 rows. Despite that I am getting "full scan" in explain analyze. The query is :
with txs as (
select
tx_id
from pol_tok_id_ops
where condition_id='e3b423dfad8c22ff75c9899c4e8176f628cf4ad4caa00481764d320e7415f7a9'
)
select
e20t.time_stamp,e20t.amount
from erc20_transf e20t
cross join txs
where txs.tx_id=e20t.tx_id;
Explain reports;
Merge Join (cost=38689523.80..103962768.64 rows=4325899145 width=40)
Merge Cond: (txs.tx_id = e20t.tx_id)
CTE txs
-> Bitmap Heap Scan on pol_tok_id_ops (cost=316.12..15969.27 rows=5622 width=8)
Recheck Cond: (condition_id = 'e3b423dfad8c22ff75c9899c4e8176f628cf4ad4caa00481764d320e7415f7a9'::text)
-> Bitmap Index Scan on pol_tokidops_cond_idx (cost=0.00..314.72 rows=5622 width=0)
Index Cond: (condition_id = 'e3b423dfad8c22ff75c9899c4e8176f628cf4ad4caa00481764d320e7415f7a9'::text)
-> Sort (cost=462.60..476.66 rows=5622 width=8)
Sort Key: txs.tx_id
-> CTE Scan on txs (cost=0.00..112.44 rows=5622 width=8)
-> Materialize (cost=38673091.92..39442551.04 rows=153891823 width=48)
-> Sort (cost=38673091.92..39057821.48 rows=153891823 width=48)
Sort Key: e20t.tx_id
-> Seq Scan on erc20_transf e20t (cost=0.00..3543917.23 rows=153891823 width=48)
Tables are defined as:
\d erc20_transf
; Table "public.erc20_transf"
Column | Type | Collation | Nullable | Default
--------------+--------------------------+-----------+----------+------------------------------------------
id | bigint | | not null | nextval('erc20_transf_id_seq'::regclass)
evtlog_id | bigint | | not null |
block_num | bigint | | not null |
time_stamp | timestamp with time zone | | |
tx_id | bigint | | not null |
contract_aid | bigint | | not null |
from_aid | bigint | | | 0
to_aid | bigint | | | 0
amount | numeric | | | 0.0
Indexes:
"erc20_transf_pkey" PRIMARY KEY, btree (id)
"erc20_transf_evtlog_id_key" UNIQUE CONSTRAINT, btree (evtlog_id)
"erc20_tr_ctrct_idx" btree (contract_aid)
"erc20_transf_from_idx" btree (from_aid)
"erc20_transf_to_idx" btree (to_aid)
"erc20_tx_idx" btree (tx_id)
Foreign-key constraints:
"erc20_transf_evtlog_id_fkey" FOREIGN KEY (evtlog_id) REFERENCES evt_log(id) ON DELETE CASCADE
Referenced by:
TABLE "erc20_bal" CONSTRAINT "erc20_bal_parent_id_fkey" FOREIGN KEY (parent_id) REFERENCES erc20_transf(id) ON DELETE CASCADE
Triggers:
erc20_transf_delete AFTER DELETE ON erc20_transf FOR EACH ROW EXECUTE PROCEDURE on_erc20_transf_delete()
erc20_transf_insert AFTER INSERT ON erc20_transf FOR EACH ROW EXECUTE PROCEDURE on_erc20_transf_insert()
\d pol_tok_id_ops
Table "public.pol_tok_id_ops"
Column | Type | Collation | Nullable | Default
------------------+---------+-----------+----------+--------------------------------------------
id | bigint | | not null | nextval('pol_tok_id_ops_id_seq'::regclass)
evtlog_id | bigint | | |
tx_id | bigint | | |
parent_split_id | bigint | | |
parent_merge_id | bigint | | |
parent_redeem_id | bigint | | |
contract_aid | bigint | | |
outcome_idx | integer | | not null |
condition_id | text | | not null |
token_id_hex | text | | not null |
token_from | text | | not null |
token_to | text | | not null |
token_amount | numeric | | not null |
Indexes:
"pol_tok_id_ops_pkey" PRIMARY KEY, btree (id)
"pol_tokidops_cond_idx" btree (condition_id)
"pol_tokidops_tx_idx" btree (tx_id)
Foreign-key constraints:
"pol_tok_id_ops_parent_merge_id_fkey" FOREIGN KEY (parent_merge_id) REFERENCES pol_pos_merge(id) ON DELETE CASCADE
"pol_tok_id_ops_parent_redeem_id_fkey" FOREIGN KEY (parent_redeem_id) REFERENCES pol_pos_merge(id) ON DELETE CASCADE
"pol_tok_id_ops_parent_split_id_fkey" FOREIGN KEY (parent_split_id) REFERENCES pol_pos_split(id) ON DELETE CASCADE
So, why it is doing full scan on 153 million records when CTE scan reports only 5622 rows??? Or, maybe there is a way to force scan on an index somehow?
I am using Postgres 10.12 (can't upgrade, its a production machine)
EDIT:
Some benchmarks:
psql=> explain analyze select * from erc20_transf WHERE tx_id in (1111111,2222222,333333,555555,666666);
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on erc20_transf (cost=71721.16..2161950.95 rows=3852686 width=96) (actual time=1.420..1.482 rows=3 loops=1)
Recheck Cond: (tx_id = ANY ('{1111111,2222222,333333,555555,666666}'::bigint[]))
Heap Blocks: exact=3
-> Bitmap Index Scan on erc20_tx_idx (cost=0.00..70757.99 rows=3852686 width=0) (actual time=1.386..1.386 rows=3 loops=1)
Index Cond: (tx_id = ANY ('{1111111,2222222,333333,555555,666666}'::bigint[]))
Planning time: 1.490 ms
Execution time: 1.557 ms
(7 rows)
psql=> explain analyze select tx_id from pol_tok_id_ops where condition_id='e3b423dfad8c22ff75c9899c4e8176f628cf4ad4caa00481764d320e7415f7a9';
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on pol_tok_id_ops (cost=431.34..21552.04 rows=7586 width=8) (actual time=1.431..5.851 rows=340 loops=1)
Recheck Cond: (condition_id = 'e3b423dfad8c22ff75c9899c4e8176f628cf4ad4caa00481764d320e7415f7a9'::text)
Heap Blocks: exact=133
-> Bitmap Index Scan on pol_tokidops_cond_idx (cost=0.00..429.45 rows=7586 width=0) (actual time=1.355..1.355 rows=340 loops=1)
Index Cond: (condition_id = 'e3b423dfad8c22ff75c9899c4e8176f628cf4ad4caa00481764d320e7415f7a9'::text)
Planning time: 1.585 ms
Execution time: 5.926 ms
(7 rows)
psql=>
Upvotes: 1
Views: 2677
Reputation: 14861
Perhaps. It would seem to be the JOIN. You have
...
select
e20t.time_stamp,e20t.amount
from erc20_transf e20t
cross join txs
where txs.tx_id=e20t.tx_id;
JOINS typically are performed before WHERE and cross join will combine every row erc20_transf with eveny row txs, then apply the WHERE. Try an INNER JOIN:
...
select
e20t.time_stamp,e20t.amount
from erc20_transf e20t
join txs on (txs.tx_id=e20t.tx_id);
Upvotes: 0
Reputation: 1601
Run vacuum analyze [table-name]
(or just analyze [table-name]
) to update the query planner statistics.
For large or frequently updated tables, PostgreSQL recommends tuning the auto-analyze (and generally auto-vacuum) settings to fit your needs. If postgres is using out of date table statistics (that don't reflect the current content/distribution of the table) it can come up with bad to terrible query plans.
Also, for unevenly distributed data on a large table (where the sampling techniques postgres uses won't always accurately reflect the underlying data, depending on the sample) you may need to tweak the target statistics. See Analyzing Extreme Distributions in Postgres and Analyze Strategy for big tables in Postgres
Upvotes: 1