Reputation: 187
In this case, the table has many text columns and I need to perform a (ilike) search on every column. I proceeded to create a gin index (gin_trgm_ops from the extension pg_trgm) for each of these columns to speedup the search, and indeed the gain is great. These are in addition to regular indexes (some queries use simple equality conditon).
However, the test shows that Postgres fails to choose the correct plan if too many conditions are OR-tied together.
Query with correct plan :
db=# explain analyze select topics_id from t_topics_header a
join t_topics_group b using (topics_group_id)
where a.subject ilike '%aaa%' or a.contents ilike '%aaa%' or a.topics_group_id = 7 and
( ( a.ext_col_01 ilike '%aaa%') or ( a.ext_col_50 ilike '%aaa%') or
( a.ext_col_57 ilike '%aaa%') or ( a.ext_col_56 ilike '%aaa%') or
( a.ext_col_63 ilike '%aaa%') or ( a.ext_col_64 ilike '%aaa%') or
( a.ext_col_54 ilike '%aaa%') or ( a.ext_col_69 ilike '%aaa%') or
( a.ext_col_31 ilike '%aaa%') or ( a.ext_col_32 ilike '%aaa%') or
( a.ext_col_41 ilike '%aaa%') or ( a.ext_col_91 ilike '%aaa%') or
( a.ext_col_42 ilike '%aaa%') or ( a.ext_col_92 ilike '%aaa%') or
( a.ext_col_43 ilike '%aaa%') or ( a.ext_col_93 ilike '%aaa%') or
( a.ext_col_44 ilike '%aaa%') or ( a.ext_col_94 ilike '%aaa%') or
( a.ext_col_45 ilike '%aaa%') or ( a.ext_col_95 ilike '%aaa%')
) order by topics_id desc OFFSET 0 LIMIT 10;
Query plan :
Limit (cost=2739.84..2739.87 rows=10 width=4) (actual time=0.437..0.437 rows=4 loops=1)
-> Sort (cost=2739.84..2741.03 rows=473 width=4) (actual time=0.436..0.436 rows=4 loops=1)
Sort Key: a.topics_id DESC
Sort Method: quicksort Memory: 25kB
-> Hash Join (cost=708.03..2729.62 rows=473 width=4) (actual time=0.416..0.426 rows=4 loops=1)
Hash Cond: (a.topics_group_id = b.topics_group_id)
-> Bitmap Heap Scan on t_topics_header a (cost=706.72..2721.80 rows=473 width=8) (actual time=0.381..0.391 rows=4 loops=1)
Recheck Cond: ((subject ~~* '%aaa%'::text) OR (contents ~~* '%aaa%'::text) OR ((ext_col_01 ~~* '%aaa%'::text) OR (ext_col_50 ~~* '%aaa%'::text) OR (ext_col_57 ~~* '%aaa%'::text) OR (ext_col_56 ~~* '%aaa%'::text) OR
(ext_col_63 ~~* '%aaa%'::text) OR (ext_col_64 ~~* '%aaa%'::text) OR (ext_col_54 ~~* '%aaa%'::text) OR (ext_col_69 ~~* '%aaa%'::text) OR (ext_col_31 ~~* '%aaa%'::text) OR (ext_col_32 ~~* '%aaa%'::text) OR (ext_col_41 ~~* '%aaa%'::text) O
R (ext_col_91 ~~* '%aaa%'::text) OR (ext_col_42 ~~* '%aaa%'::text) OR (ext_col_92 ~~* '%aaa%'::text) OR (ext_col_43 ~~* '%aaa%'::text) OR (ext_col_93 ~~* '%aaa%'::text) OR (ext_col_44 ~~* '%aaa%'::text) OR (ext_col_94 ~~* '%aaa%'::text)
OR (ext_col_45 ~~* '%aaa%'::text) OR (ext_col_95 ~~* '%aaa%'::text)))
Filter: ((subject ~~* '%aaa%'::text) OR (contents ~~* '%aaa%'::text) OR ((topics_group_id = 7) AND ((ext_col_01 ~~* '%aaa%'::text) OR (ext_col_50 ~~* '%aaa%'::text) OR (ext_col_57 ~~* '%aaa%'::text) OR (ext_col_56 ~
~* '%aaa%'::text) OR (ext_col_63 ~~* '%aaa%'::text) OR (ext_col_64 ~~* '%aaa%'::text) OR (ext_col_54 ~~* '%aaa%'::text) OR (ext_col_69 ~~* '%aaa%'::text) OR (ext_col_31 ~~* '%aaa%'::text) OR (ext_col_32 ~~* '%aaa%'::text) OR (ext_col_41
~~* '%aaa%'::text) OR (ext_col_91 ~~* '%aaa%'::text) OR (ext_col_42 ~~* '%aaa%'::text) OR (ext_col_92 ~~* '%aaa%'::text) OR (ext_col_43 ~~* '%aaa%'::text) OR (ext_col_93 ~~* '%aaa%'::text) OR (ext_col_44 ~~* '%aaa%'::text) OR (ext_col_
94 ~~* '%aaa%'::text) OR (ext_col_45 ~~* '%aaa%'::text) OR (ext_col_95 ~~* '%aaa%'::text))))
Heap Blocks: exact=4
-> BitmapOr (cost=706.72..706.72 rows=516 width=0) (actual time=0.375..0.375 rows=0 loops=1)
-> Bitmap Index Scan on t_topics_header_idx_subject_gin_trgm (cost=0.00..12.91 rows=122 width=0) (actual time=0.066..0.066 rows=4 loops=1)
Index Cond: (subject ~~* '%aaa%'::text)
-> Bitmap Index Scan on t_topics_header_idx_contents_gin_trgm (cost=0.00..8.00 rows=1 width=0) (actual time=0.016..0.016 rows=0 loops=1)
Index Cond: (contents ~~* '%aaa%'::text)
-> BitmapOr (cost=685.32..685.32 rows=394 width=0) (actual time=0.292..0.292 rows=0 loops=1)
-> Bitmap Index Scan on t_topics_header_idx_ext_col_01_gin_trgm (cost=0.00..8.85 rows=113 width=0) (actual time=0.014..0.014 rows=0 loops=1)
Index Cond: (ext_col_01 ~~* '%aaa%'::text)
-> Bitmap Index Scan on t_topics_header_idx_ext_col_50_gin_trgm (cost=0.00..12.06 rows=8 width=0) (actual time=0.022..0.022 rows=0 loops=1)
Index Cond: (ext_col_50 ~~* '%aaa%'::text)
-> Bitmap Index Scan on t_topics_header_idx_ext_col_57_gin_trgm (cost=0.00..12.12 rows=16 width=0) (actual time=0.022..0.022 rows=0 loops=1)
Index Cond: (ext_col_57 ~~* '%aaa%'::text)
-> Bitmap Index Scan on t_topics_header_idx_ext_col_56_gin_trgm (cost=0.00..80.00 rows=1 width=0) (actual time=0.023..0.023 rows=0 loops=1)
Index Cond: (ext_col_56 ~~* '%aaa%'::text)
-> Bitmap Index Scan on t_topics_header_idx_ext_col_63_gin_trgm (cost=0.00..84.05 rows=7 width=0) (actual time=0.020..0.020 rows=0 loops=1)
Index Cond: (ext_col_63 ~~* '%aaa%'::text)
-> Bitmap Index Scan on t_topics_header_idx_ext_col_64_gin_trgm (cost=0.00..72.04 rows=6 width=0) (actual time=0.020..0.020 rows=0 loops=1)
Index Cond: (ext_col_64 ~~* '%aaa%'::text)
-> Bitmap Index Scan on t_topics_header_idx_ext_col_54_gin_trgm (cost=0.00..68.00 rows=1 width=0) (actual time=0.022..0.022 rows=0 loops=1)
Index Cond: (ext_col_54 ~~* '%aaa%'::text)
-> Bitmap Index Scan on t_topics_header_idx_ext_col_69_gin_trgm (cost=0.00..12.02 rows=3 width=0) (actual time=0.022..0.022 rows=0 loops=1)
Index Cond: (ext_col_69 ~~* '%aaa%'::text)
-> Bitmap Index Scan on t_topics_header_idx_ext_col_31_gin_trgm (cost=0.00..12.04 rows=6 width=0) (actual time=0.021..0.021 rows=0 loops=1)
Index Cond: (ext_col_31 ~~* '%aaa%'::text)
-> Bitmap Index Scan on t_topics_header_idx_ext_col_32_gin_trgm (cost=0.00..16.76 rows=101 width=0) (actual time=0.022..0.022 rows=0 loops=1)
Index Cond: (ext_col_32 ~~* '%aaa%'::text)
-> Bitmap Index Scan on t_topics_header_idx_ext_col_41_gin_trgm (cost=0.00..36.00 rows=1 width=0) (actual time=0.021..0.021 rows=0 loops=1)
Index Cond: (ext_col_41 ~~* '%aaa%'::text)
-> Bitmap Index Scan on t_topics_header_idx_ext_col_91_gin_trgm (cost=0.00..20.58 rows=78 width=0) (actual time=0.020..0.020 rows=0 loops=1)
Index Cond: (ext_col_91 ~~* '%aaa%'::text)
-> Bitmap Index Scan on t_topics_header_idx_ext_col_42_gin_trgm (cost=0.00..36.00 rows=1 width=0) (actual time=0.005..0.005 rows=0 loops=1)
Index Cond: (ext_col_42 ~~* '%aaa%'::text)
-> Bitmap Index Scan on t_topics_header_idx_ext_col_92_gin_trgm (cost=0.00..12.19 rows=25 width=0) (actual time=0.008..0.008 rows=0 loops=1)
Index Cond: (ext_col_92 ~~* '%aaa%'::text)
-> Bitmap Index Scan on t_topics_header_idx_ext_col_43_gin_trgm (cost=0.00..36.00 rows=1 width=0) (actual time=0.004..0.004 rows=0 loops=1)
Index Cond: (ext_col_43 ~~* '%aaa%'::text)
-> Bitmap Index Scan on t_topics_header_idx_ext_col_93_gin_trgm (cost=0.00..8.20 rows=26 width=0) (actual time=0.006..0.006 rows=0 loops=1)
Index Cond: (ext_col_93 ~~* '%aaa%'::text)
-> Bitmap Index Scan on t_topics_header_idx_ext_col_44_gin_trgm (cost=0.00..36.00 rows=1 width=0) (actual time=0.004..0.004 rows=0 loops=1)
Index Cond: (ext_col_44 ~~* '%aaa%'::text)
-> Bitmap Index Scan on t_topics_header_idx_ext_col_94_gin_trgm (cost=0.00..8.04 rows=5 width=0) (actual time=0.006..0.006 rows=0 loops=1)
Index Cond: (ext_col_94 ~~* '%aaa%'::text)
-> Bitmap Index Scan on t_topics_header_idx_ext_col_45_gin_trgm (cost=0.00..36.00 rows=1 width=0) (actual time=0.004..0.004 rows=0 loops=1)
Index Cond: (ext_col_45 ~~* '%aaa%'::text)
-> Bitmap Index Scan on t_topics_header_idx_ext_col_95_gin_trgm (cost=0.00..76.00 rows=1 width=0) (actual time=0.004..0.004 rows=0 loops=1)
Index Cond: (ext_col_95 ~~* '%aaa%'::text)
-> Hash (cost=1.14..1.14 rows=14 width=4) (actual time=0.027..0.027 rows=14 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on t_topics_group b (cost=0.00..1.14 rows=14 width=4) (actual time=0.013..0.019 rows=14 loops=1)
Planning time: 3.767 ms
Execution time: 0.687 ms
Query with bad plan :
db=# explain analyze select topics_id from t_topics_header a
join t_topics_group b using (topics_group_id)
where a.subject ilike '%aaa%' or a.contents ilike '%aaa%' or a.topics_group_id = 7 and
( ( a.ext_col_01 ilike '%aaa%') or ( a.ext_col_50 ilike '%aaa%') or
( a.ext_col_57 ilike '%aaa%') or ( a.ext_col_56 ilike '%aaa%') or
( a.ext_col_63 ilike '%aaa%') or ( a.ext_col_64 ilike '%aaa%') or
( a.ext_col_54 ilike '%aaa%') or ( a.ext_col_69 ilike '%aaa%') or
( a.ext_col_31 ilike '%aaa%') or ( a.ext_col_32 ilike '%aaa%') or
( a.ext_col_41 ilike '%aaa%') or ( a.ext_col_91 ilike '%aaa%') or
( a.ext_col_42 ilike '%aaa%') or ( a.ext_col_92 ilike '%aaa%') or
( a.ext_col_43 ilike '%aaa%') or ( a.ext_col_93 ilike '%aaa%') or
( a.ext_col_44 ilike '%aaa%') or ( a.ext_col_94 ilike '%aaa%') or
( a.ext_col_45 ilike '%aaa%') or ( a.ext_col_95 ilike '%aaa%') or
( a.ext_col_70 ilike '%aaa%')
) order by topics_id desc OFFSET 0 LIMIT 10;
Please notice that the only difference is the added ( a.ext_col_70 ilike '%aaa%')
.
And the new query plan becomes :
Limit (cost=0.43..2036.67 rows=10 width=4) (actual time=626.343..2784.151 rows=4 loops=1)
-> Nested Loop (cost=0.43..418447.61 rows=2055 width=4) (actual time=626.341..2784.147 rows=4 loops=1)
Join Filter: (a.topics_group_id = b.topics_group_id)
Rows Removed by Join Filter: 52
-> Index Scan Backward using t_report_master_pkey on t_topics_header a (cost=0.43..418014.89 rows=2055 width=8) (actual time=626.328..2784.119 rows=4 loops=1)
Filter: ((subject ~~* '%aaa%'::text) OR (contents ~~* '%aaa%'::text) OR ((topics_group_id = 7) AND ((ext_col_01 ~~* '%aaa%'::text) OR (ext_col_50 ~~* '%aaa%'::text) OR (ext_col_57 ~~* '%aaa%'::text) OR (ext_col_56 ~~* '%a
aa%'::text) OR (ext_col_63 ~~* '%aaa%'::text) OR (ext_col_64 ~~* '%aaa%'::text) OR (ext_col_54 ~~* '%aaa%'::text) OR (ext_col_69 ~~* '%aaa%'::text) OR (ext_col_31 ~~* '%aaa%'::text) OR (ext_col_32 ~~* '%aaa%'::text) OR (ext_col_41 ~~* '
%aaa%'::text) OR (ext_col_91 ~~* '%aaa%'::text) OR (ext_col_42 ~~* '%aaa%'::text) OR (ext_col_92 ~~* '%aaa%'::text) OR (ext_col_43 ~~* '%aaa%'::text) OR (ext_col_93 ~~* '%aaa%'::text) OR (ext_col_44 ~~* '%aaa%'::text) OR (ext_col_94 ~~*
'%aaa%'::text) OR (ext_col_45 ~~* '%aaa%'::text) OR (ext_col_95 ~~* '%aaa%'::text) OR (ext_col_70 ~~* '%aaa%'::text))))
Rows Removed by Filter: 1237807
-> Materialize (cost=0.00..1.21 rows=14 width=4) (actual time=0.002..0.003 rows=14 loops=4)
-> Seq Scan on t_topics_group b (cost=0.00..1.14 rows=14 width=4) (actual time=0.004..0.007 rows=14 loops=1)
Planning time: 3.704 ms
Execution time: 2784.262 ms
I have around twice the shown amount of ext_cols (I reduced them to the minimal breaking point where the plan changes). I have gin pg_trgm indexes on EVERY ext_col, in addition to regular index on SOME of them (double checked). Also I run VACUUM ANALYZE on both tables.
DB version is PostgreSQL 9.6.6
So is there a way to hint use bitmap indexes to the analyzer ? Any ideas ?
EDIT : Apparently the length of the string has importance. With 3 letters or less, analyzer picks (bad) index scan. With 4 or more characters string, good (bitmap index) plan is chosen.
Upvotes: 3
Views: 1706
Reputation:
If you are stuck with this design, you could create an index on the concatenated value of all columns and use that expression in your where condition.
Something like:
create function query_columns(p_rec t_topics_header)
returns text
as
$$
select concat_ws(' ', p_rec.ext_col_01, p_rec.ext_col_50, p_rec.ext_col_57, p_rec.ext_col_56);
$$
language sql
immutable;
Adjust the number of columns inside concat_ws()
function to those that you want to query. I am not sure what the maximum length of an indexed expression for a GIN index is. Maybe your many columns will exceed that limit.
You can then create an index on that function:
create index on t_topics_header using gin ( (query_columns(x)) public.gin_trgm_ops);
Then the following should use the index:
select topics_id
from t_topics_header a
join t_topics_group b using (topics_group_id)
where (a.subject ilike '%aaa%' or a.contents ilike '%aaa%'
or a.topics_group_id = 7 )
and query_columns(a) ilike ='%aaa%';
Upvotes: 3