Reputation: 28386
How can I tune the PostgreSQL query plan (or the SQL query itself) to make more optimal use of available indexes when the query contains an SQL OR
condition that uses the LIKE
operator instead of =
?
For example consider the following query that takes just 5 milliseconds to execute:
explain analyze select *
from report_workflow.request_attribute
where domain = 'externalId'
and scoped_id_value[2] = 'G130324135454100';
"Bitmap Heap Scan on request_attribute (cost=44.94..6617.85 rows=2081 width=139) (actual time=4.619..4.619 rows=2 loops=1)"
" Recheck Cond: (((scoped_id_value[2])::text = 'G130324135454100'::text) AND ((domain)::text = 'externalId'::text))"
" -> Bitmap Index Scan on request_attribute_accession_number (cost=0.00..44.42 rows=2081 width=0) (actual time=3.777..3.777 rows=2 loops=1)"
" Index Cond: ((scoped_id_value[2])::text = 'G130324135454100'::text)"
"Total runtime: 5.059 ms"
As the query plan shows, this query takes advantage of partial index request_attribute_accession_number
and index condition scoped_id_value[2] = 'G130324135454100'
. Index request_attribute_accession_number
has the following definition:
CREATE INDEX request_attribute_accession_number
ON report_workflow.request_attribute((scoped_id_value[2]))
WHERE domain = 'externalId';
(Note that column scoped_id_value
in table request_attribute
has type character varying[]
.)
However, when I add to the same query an extra OR
condition that uses the same array column element scoped_id_value[2]
, but the the LIKE
operator instead of =
, the query, despite producing the same result for the same first condition, now takes 7553 ms:
explain analyze select *
from report_workflow.request_attribute
where domain = 'externalId'
and (scoped_id_value[2] = 'G130324135454100'
or scoped_id_value[2] like '%G130324135454100%');
"Bitmap Heap Scan on request_attribute (cost=7664.77..46768.27 rows=2122 width=139) (actual time=142.164..7552.650 rows=2 loops=1)"
" Recheck Cond: ((domain)::text = 'externalId'::text)"
" Rows Removed by Index Recheck: 1728712"
" Filter: (((scoped_id_value[2])::text = 'G130324135454100'::text) OR ((scoped_id_value[2])::text ~~ '%G130324135454100%'::text))"
" Rows Removed by Filter: 415884"
" -> Bitmap Index Scan on request_attribute_accession_number (cost=0.00..7664.24 rows=416143 width=0) (actual time=136.249..136.249 rows=415886 loops=1)"
"Total runtime: 7553.154 ms"
Notice that this time the query optimizer ignores index condition scoped_id_value[2] = 'G130324135454100'
when it performs the inner bitmap index scan using index request_attribute_accession_number
and consequently generates 415,886 rows instead of just two as did the first query.
When introducing the OR
condition with the LIKE
operator into this second query, why does the optimizer produce a much less optimal query plan as the first? How can I tune the query optimizer or the query to perform more like the first query?
Upvotes: 1
Views: 375
Reputation: 125244
It is not possible to short circuit this expression:
scoped_id_value[2] = 'G130324135454100'
or scoped_id_value[2] like '%G130324135454100%'
into this one:
scoped_id_value[2] = 'G130324135454100'
because it would not catch the cases where there are characters before or after that which would be matched with:
scoped_id_value[2] like '%G130324135454100%'
The only short circuit possible would be the last one. And only if Postgresql realizes that the core string (between %
s) in the last one is the same as the previous one.
Upvotes: 1
Reputation: 1269803
In the second plan, you have:
scoped_id_value[2] like '%G130324135454100%'
Postgres (nor any other database) cannot use the index to solve this. Where would it look in the index? It doesn't even know where to start, so it has to do a full table scan.
You can handle this, for this one case, by building an index on an expression (see here). However, that would be very specific to the string 'G130324135454100
'.
I should add, the problem is not the like
. Postgres would use the index on:
scoped_id_value[2] like 'G130324135454100%'
Upvotes: 2