Reputation: 8518
I am having a hard time understanding why the Oracle CBO is behaving the way it does when a bind variable is part of a OR condition.
My environment
Oracle 12.2 over Red Hat Linux 7
HINT. I am just providing a simplification of the query where the problem is located
$ sqlplus / as sysdba
SQL*Plus: Release 12.2.0.1.0 Production on Thu Jun 10 15:40:07 2021
Copyright (c) 1982, 2016, Oracle. All rights reserved.
Connected to:
Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production
SQL> @test.sql
SQL> var loanIds varchar2(4000);
SQL> exec :loanIds := '100000018330,100000031448,100000013477,100000023115,100000022550,100000183669,100000247514,100000048198,100000268289';
PL/SQL procedure successfully completed.
Elapsed: 00:00:00.00
SQL> SELECT
2 whs.* ,
3 count(*) over () AS TOTAL
4 FROM ALFAMVS.WHS_LOANS whs
5 WHERE
6 ( nvl(:loanIds,'XX') = 'XX' or
7 loanid IN (select regexp_substr(NVL(:loanIds,''),'[^,]+', 1, level) from dual
8 connect by level <= regexp_count(:loanIds,'[^,]+'))
9 )
10 ;
7 rows selected.
Elapsed: 00:00:18.72
Execution Plan
----------------------------------------------------------
Plan hash value: 2980809427
------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 6729 | 6748K| 2621 (1)| 00:00:01 |
| 1 | WINDOW BUFFER | | 6729 | 6748K| 2621 (1)| 00:00:01 |
|* 2 | FILTER | | | | | |
| 3 | TABLE ACCESS FULL | WHS_LOANS | 113K| 110M| 2621 (1)| 00:00:01 |
|* 4 | FILTER | | | | | |
|* 5 | CONNECT BY WITHOUT FILTERING (UNIQUE)| | | | | |
| 6 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 |
------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(NVL(:LOANIDS,'XX')='XX' OR EXISTS (SELECT 0 FROM "DUAL" "DUAL" WHERE
SYS_OP_C2C( REGEXP_SUBSTR (NVL(:LOANIDS,''),'[^,]+',1,LEVEL))=:B1 CONNECT BY LEVEL<=
REGEXP_COUNT (:LOANIDS,'[^,]+')))
4 - filter(SYS_OP_C2C( REGEXP_SUBSTR (NVL(:LOANIDS,''),'[^,]+',1,LEVEL))=:B1)
5 - filter(LEVEL<= REGEXP_COUNT (:LOANIDS,'[^,]+'))
Statistics
----------------------------------------------------------
288 recursive calls
630 db block gets
9913 consistent gets
1 physical reads
118724 redo size
13564 bytes sent via SQL*Net to client
608 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
113003 sorts (memory)
0 sorts (disk)
7 rows processed
SQL> set autotrace off
SQL> select count(*) from ALFAMVS.WHS_LOANS ;
COUNT(*)
----------
113095
1 row selected.
Elapsed: 00:00:00.14
KEY POINTS
If I remove the expression NVL(:loanIds,'XX') = 'XX' OR
, the query takes 0.01 seconds. Why this OR expression with BINDs is giving such headache to the Optimizer ?
-- UPDATE --
I want to thank @Alex Poole for his suggestions and share with him that the third alternative ( removing the regular expressions ) has worked as a charm. It would be great to understand why, though. You have my most sincere gratitude. I used those for a while and I never had this problem. Also, the suggestion to use regexp_like
was even better than the original one with regexp_substr
and connect by level
, but much slower by far than the one where no regular expressions are used at all
Original query
7 rows selected.
Elapsed: 00:00:36.29
New query
7 rows selected.
Elapsed: 00:00:00.58
Once the EXISTS
disappeared of the internal predicate, the query works as fast as hell.
Thank you all for your comments !
Upvotes: 2
Views: 767
Reputation: 191265
From the execution plan the optimiser is, for some reason, re-evaluating the hierarchical query for every row in your table, and then using exists()
to see if that row's ID is in the result. It isn't clear why the or
is causing that. It might be something to raise with Oracle.
From experimenting I can see three ways to at least partially work around the problem - though I'm sure there are others. The first is to move the CSV expansion to a CTE and then force that to materialize with a hint:
WITH loanIds_cte (loanId) as (
select /*+ materialize */ regexp_substr(:loanIds,'[^,]+', 1, level)
from dual
connect by level <= regexp_count(:loanIds,'[^,]+')
)
SELECT
whs.* ,
count(*) over () AS TOTAL
FROM WHS_LOANS whs
WHERE
( :loanIds is null or
loanid IN (select loanId from loanIds_cte)
)
;
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------
Plan hash value: 3226738189
--------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1102 | 9918 | 11 (0)| 00:00:01 |
| 1 | TEMP TABLE TRANSFORMATION | | | | | |
| 2 | LOAD AS SELECT | SYS_TEMP_0FD9FD2A6_198A2E1A | | | | |
|* 3 | CONNECT BY WITHOUT FILTERING| | | | | |
| 4 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 |
| 5 | WINDOW BUFFER | | 1102 | 9918 | 9 (0)| 00:00:01 |
|* 6 | FILTER | | | | | |
| 7 | TABLE ACCESS FULL | WHS_LOANS | 11300 | 99K| 9 (0)| 00:00:01 |
|* 8 | VIEW | | 1 | 2002 | 2 (0)| 00:00:01 |
| 9 | TABLE ACCESS FULL | SYS_TEMP_0FD9FD2A6_198A2E1A | 1 | 2002 | 2 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - filter(LEVEL<= REGEXP_COUNT (:LOANIDS,'[^,]+'))
6 - filter(:LOANIDS IS NULL OR EXISTS (SELECT 0 FROM (SELECT /*+ CACHE_TEMP_TABLE ("T1") */ "C0"
"LOANID" FROM "SYS"."SYS_TEMP_0FD9FD2A6_198A2E1A" "T1") "LOANIDS_CTE" WHERE SYS_OP_C2C("LOANID")=:B1))
8 - filter(SYS_OP_C2C("LOANID")=:B1)
That still does the odd transformation to exists()
, but at least now that is querying the materialized CTE, so that connect by
query is only evaluated one.
Or you could compare each loadId
value with the full string using a regular expression:
SELECT
whs.* ,
count(*) over () AS TOTAL
FROM WHS_LOANS whs
WHERE
( :loanIds is null or
regexp_like(:loanIds, '(^|,)' || loanId || '(,|$)')
)
;
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------
Plan hash value: 1622376598
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1102 | 9918 | 9 (0)| 00:00:01 |
| 1 | WINDOW BUFFER | | 1102 | 9918 | 9 (0)| 00:00:01 |
|* 2 | TABLE ACCESS FULL| WHS_LOANS | 1102 | 9918 | 9 (0)| 00:00:01 |
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(:LOANIDS IS NULL OR REGEXP_LIKE
(:LOANIDS,SYS_OP_C2C(U'(^|,)'||"LOANID"||U'(,|$)')))
which is slower than the CTE in my testing because regular expression are still expensive and you're doing 113k of them (still, better than 2 x 113k x number-of-elements of them).
Or you can avoid regular expressions and use several separate comparisons:
SELECT
whs.* ,
count(*) over () AS TOTAL
FROM WHS_LOANS whs
WHERE
( :loanIds is null or
:loanIds like loanId || ',%' or
:loanIds like '%,' || loanId or
:loanIds like '%,' || loanId || ',%'
)
;
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------
Plan hash value: 1622376598
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2096 | 18864 | 9 (0)| 00:00:01 |
| 1 | WINDOW BUFFER | | 2096 | 18864 | 9 (0)| 00:00:01 |
|* 2 | TABLE ACCESS FULL| WHS_LOANS | 2096 | 18864 | 9 (0)| 00:00:01 |
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(:LOANIDS IS NULL OR :LOANIDS LIKE
SYS_OP_C2C("LOANID"||U',%') OR :LOANIDS LIKE
SYS_OP_C2C(U'%,'||"LOANID") OR :LOANIDS LIKE
SYS_OP_C2C(U'%,'||"LOANID"||U',%'))
which is fastest of those three options in my limited testing. But there may well be better and faster approaches.
Not really relevant, but you seem to be running this as SYS which isn't a good idea, even if the data is in another schema; your loanId
column appears to be nvarchar2
(from the SYS_OP_C2C calls), which seems odd for something that could possibly be a number but in any case only seems likely to have ASCII characters; NVL(:loanIds,'')
doesn't do anything, since null and empty string are the same in Oracle; and nvl(:loanIds,'XX') = 'XX'
can be done as :loanIds is not null
which avoid magic values.
Upvotes: 2