Bruce
Bruce

Reputation: 11

query the jsonb array data by specific field condition with OR and ORDER BY constraint really slow

I have a postgresql table named events

Name Data Type
id character varying
idx integer
module character varying
method character varying
idx integer
block_height integer
data jsonb

where data is like

["hex_address", 101, 1001, 51637660000, 51324528436, 6003, 235458709417729, 234683610487930] //trade method
["hex_address", 200060013, 1, 250000000000, 3176359357709794, 6006, "0x00000000000000001431f1724a499fcd", 114440794831460] //add method
["hex_address", 200060013, 1, 42658905229340, 407285893749, "0x000000000000000000110204f76c06e2", 6006, 121017475390243, "0x000000000000000013bd9463821aedee"] //remove method

And the table is about 3 million items.The indexes have been created

CREATE INDEX IDX_event_multicolumn_index ON public.events ("module", "method", "block_height") INCLUDE ("id", "idx", "data");
CREATE INDEX event_data_gin_index ON public.events USING gin (data)

When I run the SQL SELECT data FROM public.events WHERE module = 'amm' AND (((data::jsonb->6 IN ('6002', '6003') AND method = 'LiquidityRemoved')) OR ((data::jsonb->5 IN ('6002', '6003') AND method IN ('Traded', 'LiquidityAdded')))) ORDER BY block_height DESC, idx DESC LIMIT 500;

it'll take about 1 minute. the explain analyze result(this is another table with less items):

"Limit  (cost=1909.21..1909.22 rows=5 width=334) (actual time=14019.484..14019.524 rows=100 loops=1)"
"  ->  Sort  (cost=1909.21..1909.22 rows=5 width=334) (actual time=14019.477..14019.504 rows=100 loops=1)"
"        Sort Key: block_height DESC, idx DESC"
"        Sort Method: top-N heapsort  Memory: 128kB"
"        ->  Bitmap Heap Scan on events  (cost=114.28..1909.15 rows=5 width=334) (actual time=703.038..13957.503 rows=25625 loops=1)"
"              Recheck Cond: ((((module)::text = 'amm'::text) AND ((method)::text = 'LiquidityRemoved'::text)) OR (((module)::text = 'amm'::text) AND ((method)::text = ANY ('{Traded,LiquidityAdded}'::text[]))))"
"              Filter: ((((data -> 6) = ANY ('{5002,5003}'::jsonb[])) AND ((method)::text = 'LiquidityRemoved'::text)) OR (((data -> 5) = ANY ('{5002,5003}'::jsonb[])) AND ((method)::text = ANY ('{Traded,LiquidityAdded}'::text[]))))"
"              Rows Removed by Filter: 9435"
"              Heap Blocks: exact=28532"
"              ->  BitmapOr  (cost=114.28..114.28 rows=462 width=0) (actual time=696.569..696.580 rows=0 loops=1)"
"                    ->  Bitmap Index Scan on ""IDX_event_multicolumn_index""  (cost=0.00..4.59 rows=3 width=0) (actual time=24.375..24.382 rows=896 loops=1)"
"                          Index Cond: (((module)::text = 'amm'::text) AND ((method)::text = 'LiquidityRemoved'::text))"
"                    ->  Bitmap Index Scan on ""IDX_event_multicolumn_index""  (cost=0.00..109.69 rows=459 width=0) (actual time=672.191..672.191 rows=34164 loops=1)"
"                          Index Cond: (((module)::text = 'amm'::text) AND ((method)::text = ANY ('{Traded,LiquidityAdded}'::text[])))"

If I delete the multicolumn index, it does be faster and take about 20 seconds.

"Gather Merge  (cost=477713.00..477720.00 rows=60 width=130) (actual time=22151.357..22210.826 rows=79864 loops=1)"
"  Workers Planned: 2"
"  Workers Launched: 2"
"  ->  Sort  (cost=476712.97..476713.05 rows=30 width=130) (actual time=22090.960..22097.308 rows=26621 loops=3)"
"        Sort Key: block_height"
"        Sort Method: external merge  Disk: 5400kB"
"        Worker 0:  Sort Method: external merge  Disk: 5264kB"
"        Worker 1:  Sort Method: external merge  Disk: 5416kB"
"        ->  Parallel Seq Scan on events  (cost=0.00..476712.24 rows=30 width=130) (actual time=5.151..21985.878 rows=26621 loops=3)"
"              Filter: (((module)::text = 'amm'::text) AND ((((data -> 6) = ANY ('{6002,6003}'::jsonb[])) AND ((method)::text = 'LiquidityRemoved'::text)) OR (((data -> 5) = ANY ('{6002,6003}'::jsonb[])) AND ((method)::text = ANY ('{Traded,LiquidityAdded}'::text[])))))"
"              Rows Removed by Filter: 2858160"
"Planning Time: 0.559 ms"
"Execution Time: 22217.351 ms"

Then I try to create the specific field multicolumn index:

create EXTENSION btree_gin;
CREATE INDEX extcondindex ON public.events USING gin (((data -> 5)), ((data -> 6)), module, method);

The result is just same with the origin multi column index. If I remove one of the OR constraint SELECT data FROM public.events WHERE module = 'amm' AND (((data::jsonb->6 IN ('6002', '6003') AND method = 'LiquidityRemoved'))) ORDER BY block_height DESC, idx DESC LIMIT 500; It's fast enough and takes about 3 seconds.

I want to know why the multi column index slow down the query, and how should I add the index for the specific field to optimize my query.

The io timing and buffer analyze for the multi-column index table

"Sort  (cost=1916.98..1916.99 rows=5 width=142) (actual time=14204.316..14210.109 rows=25683 loops=1)"
"  Sort Key: block_height"
"  Sort Method: external merge  Disk: 5264kB"
"  Buffers: shared hit=93 read=31211, temp read=658 written=659"
"  I/O Timings: read=13533.823"
"  ->  Bitmap Heap Scan on events  (cost=114.30..1916.92 rows=5 width=142) (actual time=926.714..14156.662 rows=25683 loops=1)"
"        Recheck Cond: ((((module)::text = 'amm'::text) AND ((method)::text = 'LiquidityRemoved'::text)) OR (((module)::text = 'amm'::text) AND ((method)::text = ANY ('{Traded,LiquidityAdded}'::text[]))))"
"        Filter: ((((data -> 6) = ANY ('{5002,5003}'::jsonb[])) AND ((method)::text = 'LiquidityRemoved'::text)) OR (((data -> 5) = ANY ('{5002,5003}'::jsonb[])) AND ((method)::text = ANY ('{Traded,LiquidityAdded}'::text[]))))"
"        Rows Removed by Filter: 9503"
"        Heap Blocks: exact=28626"
"        Buffers: shared hit=90 read=31211"
"        I/O Timings: read=13533.823"
"        ->  BitmapOr  (cost=114.30..114.30 rows=464 width=0) (actual time=919.777..919.779 rows=0 loops=1)"
"              Buffers: shared hit=27 read=2648"
"              I/O Timings: read=892.499"
"              ->  Bitmap Index Scan on ""IDX_event_multicolumn_index""  (cost=0.00..4.59 rows=3 width=0) (actual time=32.255..32.255 rows=898 loops=1)"
"                    Index Cond: (((module)::text = 'amm'::text) AND ((method)::text = 'LiquidityRemoved'::text))"
"                    Buffers: shared hit=7 read=72"
"                    I/O Timings: read=30.095"
"              ->  Bitmap Index Scan on ""IDX_event_multicolumn_index""  (cost=0.00..109.71 rows=461 width=0) (actual time=887.519..887.520 rows=34288 loops=1)"
"                    Index Cond: (((module)::text = 'amm'::text) AND ((method)::text = ANY ('{Traded,LiquidityAdded}'::text[])))"
"                    Buffers: shared hit=20 read=2576"
"                    I/O Timings: read=862.404"
"Planning:"
"  Buffers: shared hit=230"
"Planning Time: 0.663 ms"
"Execution Time: 14214.038 ms"

Upvotes: 1

Views: 45

Answers (0)

Related Questions