Derek Mahar
Derek Mahar

Reputation: 28386

How can I tune the PostgreSQL query optimizer when using SQL logical 'OR' and operator 'LIKE'?

How can I tune the PostgreSQL query plan (or the SQL query itself) to make more optimal use of available indexes when the query contains an SQL OR condition that uses the LIKE operator instead of =?

For example consider the following query that takes just 5 milliseconds to execute:

explain analyze select *
from report_workflow.request_attribute
where domain = 'externalId'
and scoped_id_value[2] = 'G130324135454100';

"Bitmap Heap Scan on request_attribute  (cost=44.94..6617.85 rows=2081 width=139) (actual time=4.619..4.619 rows=2 loops=1)"
"  Recheck Cond: (((scoped_id_value[2])::text = 'G130324135454100'::text) AND ((domain)::text = 'externalId'::text))"
"  ->  Bitmap Index Scan on request_attribute_accession_number  (cost=0.00..44.42 rows=2081 width=0) (actual time=3.777..3.777 rows=2 loops=1)"
"        Index Cond: ((scoped_id_value[2])::text = 'G130324135454100'::text)"
"Total runtime: 5.059 ms"

As the query plan shows, this query takes advantage of partial index request_attribute_accession_number and index condition scoped_id_value[2] = 'G130324135454100'. Index request_attribute_accession_number has the following definition:

CREATE INDEX request_attribute_accession_number
ON report_workflow.request_attribute((scoped_id_value[2]))
WHERE domain = 'externalId';

(Note that column scoped_id_value in table request_attribute has type character varying[].)

However, when I add to the same query an extra OR condition that uses the same array column element scoped_id_value[2], but the the LIKE operator instead of =, the query, despite producing the same result for the same first condition, now takes 7553 ms:

explain analyze select *
from report_workflow.request_attribute
where domain = 'externalId'
and (scoped_id_value[2] = 'G130324135454100'
or scoped_id_value[2] like '%G130324135454100%');

"Bitmap Heap Scan on request_attribute  (cost=7664.77..46768.27 rows=2122 width=139) (actual time=142.164..7552.650 rows=2 loops=1)"
"  Recheck Cond: ((domain)::text = 'externalId'::text)"
"  Rows Removed by Index Recheck: 1728712"
"  Filter: (((scoped_id_value[2])::text = 'G130324135454100'::text) OR ((scoped_id_value[2])::text ~~ '%G130324135454100%'::text))"
"  Rows Removed by Filter: 415884"
"  ->  Bitmap Index Scan on request_attribute_accession_number  (cost=0.00..7664.24 rows=416143 width=0) (actual time=136.249..136.249 rows=415886 loops=1)"
"Total runtime: 7553.154 ms"

Notice that this time the query optimizer ignores index condition scoped_id_value[2] = 'G130324135454100' when it performs the inner bitmap index scan using index request_attribute_accession_number and consequently generates 415,886 rows instead of just two as did the first query.

When introducing the OR condition with the LIKE operator into this second query, why does the optimizer produce a much less optimal query plan as the first? How can I tune the query optimizer or the query to perform more like the first query?

Upvotes: 1

Views: 375

Answers (2)

Clodoaldo Neto
Clodoaldo Neto

Reputation: 125244

It is not possible to short circuit this expression:

scoped_id_value[2] = 'G130324135454100'
or scoped_id_value[2] like '%G130324135454100%'

into this one:

scoped_id_value[2] = 'G130324135454100'

because it would not catch the cases where there are characters before or after that which would be matched with:

scoped_id_value[2] like '%G130324135454100%'

The only short circuit possible would be the last one. And only if Postgresql realizes that the core string (between %s) in the last one is the same as the previous one.

Upvotes: 1

Gordon Linoff
Gordon Linoff

Reputation: 1269803

In the second plan, you have:

scoped_id_value[2] like '%G130324135454100%'

Postgres (nor any other database) cannot use the index to solve this. Where would it look in the index? It doesn't even know where to start, so it has to do a full table scan.

You can handle this, for this one case, by building an index on an expression (see here). However, that would be very specific to the string 'G130324135454100'.

I should add, the problem is not the like. Postgres would use the index on:

scoped_id_value[2] like 'G130324135454100%'

Upvotes: 2

Related Questions