shuba.ivan
shuba.ivan

Reputation: 4061

How to optimize query if I already use GIN index

I use (PostgreSQL) 11.8 and I try to provide full text search opportunity by some columns. For that I created GIN index with multiple fields and coalesce. And after my data base grewto 344747 rows in table products I faced with slow execution in my query. When I execute query example I faced with approximately 4.6s. In analyzing information I see my index used, but still slowly. Bitmap Index Scan on npdbcs_swedish_custom_index present. If I correct made conclusion many time spent to grouping. Do someone know any approach or suggestion how to optimize this query, because I can't imagine how it will work when my db grows to 10 million productsю And most hope data from ANALYZE - Planning Time: 0.790 ms it' possible?

->  GroupAggregate  (cost=27625.67..47476.93 rows=10284 width=928) (actual time=32.328..4490.279 rows=6634 loops=1)

my index

create index npdbcs_swedish_custom_index on products
                    using GIN(to_tsvector('pg_catalog.swedish', name||price||description||brand||category||shop))

and my query

 EXPLAIN ANALYZE
        SELECT                         
            products_alias.id,
            products_alias.sku,
            products_alias.name AS "name",
            products_alias.description,
            products_alias.category,
            products_alias.price,
            products_alias.shipping,
            products_alias.currency,
            products_alias.instock,
            products_alias.product_url AS "productUrl",
            products_alias.image_url AS "imageUrl",
            products_alias.tracking_url AS "trackingUrl",
            products_alias.brand,
            products_alias.shop,
            products_alias.original_price AS "originalPrice",
            products_alias.ean,
            products_alias.manufacturer_article_number AS "manufacturerArticleNumber",
            products_alias.extras,
            products_alias.created_at AS "createdAt",
            products_alias.brand_relation_id AS "brandRelationId",
            products_alias.shop_relation_id AS "shopRelationId",
            array_agg(DISTINCT cpt.category_id) AS categoryIds,
            COUNT(DISTINCT uip.id) as "numberOfEntries",
            ts_rank_cd(to_tsvector('pg_catalog.swedish', products_alias.name||products_alias.price||products_alias.description||products_alias.brand||products_alias.category||products_alias.shop), to_tsquery('pg_catalog.swedish', 'AeroMoov&Air&Layer™&Group&2&Sittdyna&Mörkgrå&One&Size:*|329.00|AeroMoov|Liggdelar|Duovagnar|Barnvagnar|Solskydd')) AS rank

        FROM products products_alias 
        LEFT JOIN user_ip_product uip on uip.products_id = products_alias.id 
                LEFT JOIN product_category cpt on cpt.product_id = products_alias.id
                WHERE to_tsvector('pg_catalog.swedish', products_alias.name||products_alias.price||products_alias.description||products_alias.brand||products_alias.category||products_alias.shop) @@ to_tsquery('pg_catalog.swedish', 'AeroMoov&Air&Layer™&Group&2&Sittdyna&Mörkgrå&One&Size:*|329.00|AeroMoov|Liggdelar|Duovagnar|Barnvagnar|Solskydd')             
        AND products_alias.id != 810429                                         
        GROUP BY products_alias.id ORDER BY rank DESC                                          
        LIMIT 4
        OFFSET 0;



Limit  (cost=47631.19..47631.20 rows=4 width=928) (actual time=4401.654..4401.656 rows=4 loops=1)
  ->  Sort  (cost=47631.19..47656.90 rows=10284 width=928) (actual time=4401.652..4401.653 rows=4 loops=1)
        Sort Key: (ts_rank_cd(to_tsvector('swedish'::regconfig, ((((((products_alias.name)::text || (products_alias.price)::text) || products_alias.description) || (products_alias.brand)::text) || (products_alias.category)::text) || (products_alias.shop)::text)), '''aeromoov'' & ''air'' & ''layer'' & ''group'' & ''2'' & ''sittdyn'' & ''mörkgrå'' & ''one'' & ''siz'':* | ''329.00'' | ''aeromoov'' | ''liggdel'' | ''duovagn'' | ''barnvagn'' | ''solskyd'''::tsquery)) DESC
        Sort Method: top-N heapsort  Memory: 31kB
        ->  GroupAggregate  (cost=27625.67..47476.93 rows=10284 width=928) (actual time=31.780..4382.917 rows=6634 loops=1)
              Group Key: products_alias.id
              ->  Gather Merge  (cost=27625.67..44420.18 rows=20203 width=892) (actual time=31.096..88.389 rows=13451 loops=1)
                    Workers Planned: 2
                    Workers Launched: 2
                    ->  Nested Loop Left Join  (cost=26625.65..41088.23 rows=8418 width=892) (actual time=24.549..52.293 rows=4484 loops=3)
                          ->  Merge Left Join  (cost=26625.22..26646.92 rows=4285 width=888) (actual time=24.513..29.039 rows=2212 loops=3)
                                Merge Cond: (products_alias.id = uip.products_id)
                                ->  Sort  (cost=26622.18..26632.89 rows=4285 width=884) (actual time=24.430..26.065 rows=2211 loops=3)
                                      Sort Key: products_alias.id
                                      Sort Method: external merge  Disk: 2808kB
                                      Worker 0:  Sort Method: quicksort  Memory: 3261kB
                                      Worker 1:  Sort Method: quicksort  Memory: 2112kB
                                      ->  Parallel Bitmap Heap Scan on products products_alias  (cost=283.70..26363.68 rows=4285 width=884) (actual time=8.825..17.986 rows=2211 loops=3)
                                            Recheck Cond: (to_tsvector('swedish'::regconfig, ((((((name)::text || (price)::text) || description) || (brand)::text) || (category)::text) || (shop)::text)) @@ '''aeromoov'' & ''air'' & ''layer'' & ''group'' & ''2'' & ''sittdyn'' & ''mörkgrå'' & ''one'' & ''siz'':* | ''329.00'' | ''aeromoov'' | ''liggdel'' | ''duovagn'' | ''barnvagn'' | ''solskyd'''::tsquery)
                                            Filter: (id <> 810429)
                                            Rows Removed by Filter: 0
                                            Heap Blocks: exact=2563
                                            ->  Bitmap Index Scan on npdbcs_swedish_custom_index  (cost=0.00..281.13 rows=10284 width=0) (actual time=10.858..10.859 rows=6635 loops=1)
                                                  Index Cond: (to_tsvector('swedish'::regconfig, ((((((name)::text || (price)::text) || description) || (brand)::text) || (category)::text) || (shop)::text)) @@ '''aeromoov'' & ''air'' & ''layer'' & ''group'' & ''2'' & ''sittdyn'' & ''mörkgrå'' & ''one'' & ''siz'':* | ''329.00'' | ''aeromoov'' | ''liggdel'' | ''duovagn'' | ''barnvagn'' | ''solskyd'''::tsquery)
                                ->  Sort  (cost=3.05..3.18 rows=53 width=8) (actual time=0.070..0.086 rows=59 loops=3)
                                      Sort Key: uip.products_id
                                      Sort Method: quicksort  Memory: 27kB
                                      Worker 0:  Sort Method: quicksort  Memory: 27kB
                                      Worker 1:  Sort Method: quicksort  Memory: 27kB
                                      ->  Seq Scan on user_ip_product uip  (cost=0.00..1.53 rows=53 width=8) (actual time=0.030..0.040 rows=59 loops=3)
                          ->  Index Scan using idx_cdfc73564584665a on product_category cpt  (cost=0.42..3.34 rows=3 width=8) (actual time=0.007..0.009 rows=2 loops=6636)
                                Index Cond: (product_id = products_alias.id)
Planning Time: 0.660 ms
Execution Time: 4402.940 ms

would be great if it spent 0.660 ms. How to optimize this query ?

Upvotes: 0

Views: 300

Answers (2)

jjanes
jjanes

Reputation: 44147

would be great if it spent 0.660 ms.

The planning time is how much time it spent planning out the execution. It is not how much time the planner thought the execution of the plan would/should take. There is no reason to think your hope here is achievable, those two numbers represent entirely different things. It is like looking how much time you spend tying your shoes, versus running a marathon. Sure, it would be nice if you could run a marathon in the same time as it takes to tie your shoes, but there is no reason to think that that is achievable.

I'm guessing most of the time is going to either computing ts_rank_cd, or gathering the (possibly TOASTed) inputs necessary to compute it. What happens if you remove the ts_rank_cd and choose something else to ORDER BY? EXPLAIN VERBOSE on a simplified version of your query does suggest to me that the actual evaluation of ts_rank_cd is occuring in the correct plan node (i.e. the slow one) for this explanation to be plausible.

One thing that might make it faster is if you save the computed tsvector into the table as a real column, rather than computing it dynamically. The extra storage needed of course is a trade off there.

I can't imagine how it will work when my db grows to 10 million products

As your products table grows, is your WHERE condition going to still return about the same number of rows, or will it return about the same proportion of the table?

Upvotes: 0

Laurenz Albe
Laurenz Albe

Reputation: 246403

Try increasing work_mem in the hope that you can get a more efficient hash aggregate.

I admit that I find it surprising that the time is spent in the group aggregate ...

Upvotes: 1

Related Questions