unnik
unnik

Reputation: 1153

Postgres GIN index not being used, SEQ scan done always

I have created a table to token search

CREATE TABLE tsvector_business_objects (
    id int4 NULL,
    "type" varchar NULL,
    value varchar NULL,
    label varchar NULL,
    synonyms _text NULL,
    label_tsvector tsvector NULL
);
CREATE INDEX label_tsvector_idx ON tsvector_business_objects USING gin (label_tsvector);
CREATE INDEX lower_case_synonym_tsvector_business_objects ON tsvector_business_objects USING gin (synonyms);
CREATE INDEX txt_label_tsvector_business_objects ON tsvector_business_objects USING gin (to_tsvector('english'::regconfig, lower((label)::text)));
CREATE INDEX txt_value_tsvector_business_objects ON tsvector_business_objects USING gin (to_tsvector('english'::regconfig, lower((value)::text)));
CREATE INDEX type_tsvector_business_objects ON tsvector_business_objects USING btree (type);
CREATE INDEX type_value_label_lower_case_tsvector_business_objects ON tsvector_business_objects USING btree (lower((type)::text), lower((value)::text), lower((label)::text));

I have some text like OID-0127820 in the data, when I search for the toke OID using ts_vector Postgres always uses sequence scan instead of using the GIN index.

There are 500k records with OID text. But when I disable the sequence scan the index is being used

SET enable_seqscan = ON;
explain (analyze,verbose,buffers,timing,costs) 
SELECT TYPE, value, label, synonyms, ((to_tsvector('english', lower(value)) @@ plainto_tsquery('english', 'OID') AND numnode(plainto_tsquery('english', 'OID')) > 0) OR (lower(value) LIKE '% OID %' AND numnode(plainto_tsquery('english', 'OID')) = 0) OR ((to_tsvector('english', lower(label))) @@ plainto_tsquery('english', 'OID') AND numnode(plainto_tsquery('english', 'OID')) > 0) OR (lower(label) LIKE '% OID %' AND numnode(plainto_tsquery('english', 'OID')) = 0)) as partial_value_label_match, (lower(value) ='OID' OR lower(label) ='OID') as exact_value_label_match, (synonyms @> '{OID}') is true as synonym_match FROM tsvector_business_objects AS business_objects_alias
WHERE synonyms @> '{OID}'
OR (to_tsvector('english', lower(value)) @@ plainto_tsquery('english', 'OID') AND numnode(plainto_tsquery('english', 'OID')) > 0)
OR (lower(value) LIKE '% OID %' AND numnode(plainto_tsquery('english', 'OID')) = 0)
OR ((to_tsvector('english', lower(label))) @@ plainto_tsquery('english', 'OID') AND numnode(plainto_tsquery('english', 'OID')) > 0) 
OR (lower(label) LIKE '% OID %' AND numnode(plainto_tsquery('english', 'OID')) = 0) OR (lower(value) ='OID' OR lower(label) ='OID')
limit 30



QUERY PLAN                                                                                                                                                                                                                                                     |
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
Limit  (cost=0.00..41.91 rows=30 width=82) (actual time=3804.998..3805.163 rows=30 loops=1)                                                                                                                                                                    |
  Output: type, value, label, synonyms, (((to_tsvector('english'::regconfig, lower((value)::text)) @@ '''oid'''::tsquery) OR (to_tsvector('english'::regconfig, lower((label)::text)) @@ '''oid'''::tsquery))), (((lower((value)::text) = 'OID'::text) OR (lowe|
  Buffers: shared hit=21217                                                                                                                                                                                                                                    |
  ->  Seq Scan on mdlz_performancebenchmarking.tsvector_business_objects business_objects_alias  (cost=0.00..717278.98 rows=513425 width=82) (actual time=3804.997..3805.159 rows=30 loops=1)                                                                  |
        Output: type, value, label, synonyms, ((to_tsvector('english'::regconfig, lower((value)::text)) @@ '''oid'''::tsquery) OR (to_tsvector('english'::regconfig, lower((label)::text)) @@ '''oid'''::tsquery)), ((lower((value)::text) = 'OID'::text) OR (l|
        Filter: ((business_objects_alias.synonyms @> '{OID}'::text[]) OR (to_tsvector('english'::regconfig, lower((business_objects_alias.value)::text)) @@ '''oid'''::tsquery) OR (to_tsvector('english'::regconfig, lower((business_objects_alias.label)::tex|
        Rows Removed by Filter: 575042                                                                                                                                                                                                                         |
        Buffers: shared hit=21217                                                                                                                                                                                                                              |
Planning time: 0.226 ms                                                                                                                                                                                                                                        |
Execution time: 3805.210 ms   
SET enable_seqscan = OFF;
explain (analyze,verbose,buffers,timing,costs) 
SELECT TYPE, value, label, synonyms, ((to_tsvector('english', lower(value)) @@ plainto_tsquery('english', 'OID') AND numnode(plainto_tsquery('english', 'OID')) > 0) OR (lower(value) LIKE '% OID %' AND numnode(plainto_tsquery('english', 'OID')) = 0) OR ((to_tsvector('english', lower(label))) @@ plainto_tsquery('english', 'OID') AND numnode(plainto_tsquery('english', 'OID')) > 0) OR (lower(label) LIKE '% OID %' AND numnode(plainto_tsquery('english', 'OID')) = 0)) as partial_value_label_match, (lower(value) ='OID' OR lower(label) ='OID') as exact_value_label_match, (synonyms @> '{OID}') is true as synonym_match FROM tsvector_business_objects AS business_objects_alias
WHERE synonyms @> '{OID}'
OR (to_tsvector('english', lower(value)) @@ plainto_tsquery('english', 'OID') AND numnode(plainto_tsquery('english', 'OID')) > 0)
OR (lower(value) LIKE '% OID %' AND numnode(plainto_tsquery('english', 'OID')) = 0)
OR ((to_tsvector('english', lower(label))) @@ plainto_tsquery('english', 'OID') AND numnode(plainto_tsquery('english', 'OID')) > 0) 
OR (lower(label) LIKE '% OID %' AND numnode(plainto_tsquery('english', 'OID')) = 0) OR (lower(value) ='OID' OR lower(label) ='OID')
limit 30



QUERY PLAN                                                                                                                                                                                                                                                     |
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
Limit  (cost=66996.67..67036.72 rows=30 width=82) (actual time=312.131..312.251 rows=30 loops=1)                                                                                                                                                               |
  Output: type, value, label, synonyms, (((to_tsvector('english'::regconfig, lower((value)::text)) @@ '''oid'''::tsquery) OR (to_tsvector('english'::regconfig, lower((label)::text)) @@ '''oid'''::tsquery))), (((lower((value)::text) = 'OID'::text) OR (lowe|
  Buffers: shared hit=45103                                                                                                                                                                                                                                    |
  ->  Bitmap Heap Scan on mdlz_performancebenchmarking.tsvector_business_objects business_objects_alias  (cost=66996.67..752444.39 rows=513425 width=82) (actual time=312.130..312.247 rows=30 loops=1)                                                        |
        Output: type, value, label, synonyms, ((to_tsvector('english'::regconfig, lower((value)::text)) @@ '''oid'''::tsquery) OR (to_tsvector('english'::regconfig, lower((label)::text)) @@ '''oid'''::tsquery)), ((lower((value)::text) = 'OID'::text) OR (l|
        Recheck Cond: ((business_objects_alias.synonyms @> '{OID}'::text[]) OR (to_tsvector('english'::regconfig, lower((business_objects_alias.value)::text)) @@ '''oid'''::tsquery) OR (to_tsvector('english'::regconfig, lower((business_objects_alias.label|
        Heap Blocks: exact=1                                                                                                                                                                                                                                   |
        Buffers: shared hit=45103                                                                                                                                                                                                                              |
        ->  BitmapOr  (cost=66996.67..66996.67 rows=666282 width=0) (actual time=310.732..310.732 rows=0 loops=1)                                                                                                                                              |
              Buffers: shared hit=45102                                                                                                                                                                                                                        |
              ->  Bitmap Index Scan on lower_case_synonym_tsvector_business_objects  (cost=0.00..189.20 rows=1 width=0) (actual time=0.008..0.008 rows=0 loops=1)                                                                                              |
                    Index Cond: (business_objects_alias.synonyms @> '{OID}'::text[])                                                                                                                                                                           |
                    Buffers: shared hit=2                                                                                                                                                                                                                      |
              ->  Bitmap Index Scan on txt_value_tsvector_business_objects  (cost=0.00..2638.25 rows=333140 width=0) (actual time=49.131..49.131 rows=575000 loops=1)                                                                                          |
                    Index Cond: (to_tsvector('english'::regconfig, lower((business_objects_alias.value)::text)) @@ '''oid'''::tsquery)                                                                                                                         |
                    Buffers: shared hit=86                                                                                                                                                                                                                     |
              ->  Bitmap Index Scan on txt_label_tsvector_business_objects  (cost=0.00..2638.25 rows=333140 width=0) (actual time=49.956..49.956 rows=575000 loops=1)                                                                                          |
                    Index Cond: (to_tsvector('english'::regconfig, lower((business_objects_alias.label)::text)) @@ '''oid'''::tsquery)                                                                                                                         |
                    Buffers: shared hit=86                                                                                                                                                                                                                     |
              ->  Bitmap Index Scan on type_value_label_lower_case_tsvector_business_objects  (cost=0.00..30444.59 rows=1 width=0) (actual time=93.168..93.169 rows=0 loops=1)                                                                                 |
                    Index Cond: (lower((business_objects_alias.value)::text) = 'OID'::text)                                                                                                                                                                    |
                    Buffers: shared hit=22464                                                                                                                                                                                                                  |
              ->  Bitmap Index Scan on type_value_label_lower_case_tsvector_business_objects  (cost=0.00..30444.59 rows=1 width=0) (actual time=118.465..118.465 rows=0 loops=1)                                                                               |
                    Index Cond: (lower((business_objects_alias.label)::text) = 'OID'::text)                                                                                                                                                                    |
                    Buffers: shared hit=22464                                                                                                                                                                                                                  |
Planning time: 0.249 ms                                                                                                                                                                                                                                        |
Execution time: 312.579 ms    

Output Data


type                 |value      |label      |synonyms|label_tsvector      |partial_value_label_match|exact_value_label_match|synonym_match|
---------------------|-----------|-----------|--------|--------------------|-------------------------|-----------------------|-------------|
orderid_1621409737948|OID-0127820|OID-0127820|NULL    |'-0127820':2 'oid':1|true                     |false                  |false        |
orderid_1621409737948|OID-0355880|OID-0355880|NULL    |'-0355880':2 'oid':1|true                     |false                  |false        |
orderid_1621409737948|OID-0041048|OID-0041048|NULL    |'-0041048':2 'oid':1|true                     |false                  |false        |
orderid_1621409737948|OID-0499716|OID-0499716|NULL    |'-0499716':2 'oid':1|true                     |false                  |false        |
orderid_1621409737948|OID-0219268|OID-0219268|NULL    |'-0219268':2 'oid':1|true                     |false                  |false        |
orderid_1621409737948|OID-0000560|OID-0000560|NULL    |'-0000560':2 'oid':1|true                     |false                  |false        |
orderid_1621409737948|OID-0355656|OID-0355656|NULL    |'-0355656':2 'oid':1|true                     |false                  |false        |
orderid_1621409737948|OID-0355628|OID-0355628|NULL    |'-0355628':2 'oid':1|true                     |false                  |false        |
orderid_1621409737948|OID-0219380|OID-0219380|NULL    |'-0219380':2 'oid':1|true                     |false                  |false        |
orderid_1621409737948|OID-0063896|OID-0063896|NULL    |'-0063896':2 'oid':1|true                     |false                  |false        |
orderid_1621409737948|OID-0054740|OID-0054740|NULL    |'-0054740':2 'oid':1|true                     |false                  |false        |
orderid_1621409737948|OID-0447020|OID-0447020|NULL    |'-0447020':2 'oid':1|true                     |false                  |false        |
orderid_1621409737948|OID-0040964|OID-0040964|NULL    |'-0040964':2 'oid':1|true                     |false                  |false        |
orderid_1621409737948|OID-0499744|OID-0499744|NULL    |'-0499744':2 'oid':1|true                     |false                  |false        |
orderid_1621409737948|OID-0219800|OID-0219800|NULL    |'-0219800':2 'oid':1|true                     |false                  |false        |
orderid_1621409737948|OID-0355180|OID-0355180|NULL    |'-0355180':2 'oid':1|true                     |false                  |false        |
orderid_1621409737948|OID-0530348|OID-0530348|NULL    |'-0530348':2 'oid':1|true                     |false                  |false        |
orderid_1621409737948|OID-0219996|OID-0219996|NULL    |'-0219996':2 'oid':1|true                     |false                  |false        |
orderid_1621409737948|OID-0220024|OID-0220024|NULL    |'-0220024':2 'oid':1|true                     |false                  |false        |
orderid_1621409737948|OID-0446936|OID-0446936|NULL    |'-0446936':2 'oid':1|true                     |false                  |false        |
orderid_1621409737948|OID-0220108|OID-0220108|NULL    |'-0220108':2 'oid':1|true                     |false                  |false        |
orderid_1621409737948|OID-0020692|OID-0020692|NULL    |'-0020692':2 'oid':1|true                     |false                  |false        |
orderid_1621409737948|OID-0354872|OID-0354872|NULL    |'-0354872':2 'oid':1|true                     |false                  |false        |
orderid_1621409737948|OID-0097496|OID-0097496|NULL    |'-0097496':2 'oid':1|true                     |false                  |false        |
orderid_1621409737948|OID-0354648|OID-0354648|NULL    |'-0354648':2 'oid':1|true                     |false                  |false        |
orderid_1621409737948|OID-0128268|OID-0128268|NULL    |'-0128268':2 'oid':1|true                     |false                  |false        |
orderid_1621409737948|OID-0354536|OID-0354536|NULL    |'-0354536':2 'oid':1|true                     |false                  |false        |
orderid_1621409737948|OID-0530432|OID-0530432|NULL    |'-0530432':2 'oid':1|true                     |false                  |false        |
orderid_1621409737948|OID-0128324|OID-0128324|NULL    |'-0128324':2 'oid':1|true                     |false                  |false        |
orderid_1621409737948|OID-0354256|OID-0354256|NULL    |'-0354256':2 'oid':1|true                     |false                  |false        |

The random_page_cost is 1.1.

I think the Postgres query planner thinks that using sequence scan would be faster but that does not seem to be the case.

Upvotes: 3

Views: 1586

Answers (2)

jjanes
jjanes

Reputation: 44202

A fundamental problem is that the planner thinks it will find 30 rows very early on in the seq scan, and so can stop very early. It is wrong, apparently because few of the rows that match your condition occur in the early part of the table. This can be very hard to fix just by improving the selectivity estimates, as it can make this mistake even if the row estimate are spot on, but the rows are not evenly scattered over the table.

Upvotes: 3

Laurenz Albe
Laurenz Albe

Reputation: 246688

The problem is that the many conditions are highly correlated, since they test almost the same thing. Since the optimizer doesn't know that and treats them as statistically independent, it arrives at a wrong estimate.

You could try to simplify the WHERE condition, or you coould upgrade to PostgreSQL v14, where commit a4d75c86bf1 introduced extended statistics on expressions.

Upvotes: 2

Related Questions