Reputation: 7318
I have two SQL statements whose performance I expect to be similar, but in fact SQL1 used 0.065 seconds and SQL2 used over 10 seconds with just 8000 records in total. Could anyone help to explain this? How can I optimize SQL2?
SQL 1:
select
job_id,
JOB_DESCRIPTION,
REGEXP_COUNT(JOB_Description, '(ABC|DEF)([[:digit:]]){5}') as occurrences
from smms.job
where TO_NUMBER(to_char(CREATE_DATE,'YYYY')) = 2017;
SQL 2:
select job_id, JOB_Description
from (
select
job_id,
JOB_DESCRIPTION,
REGEXP_COUNT(JOB_Description, '(ABC|DEF)([[:digit:]]){5}') as occurrences
from smms.job
where TO_NUMBER(to_char(CREATE_DATE,'YYYY')) = 2017
)
where occurrences > 0;
Upvotes: 2
Views: 278
Reputation: 611
At SQL1 it filters by (TO_NUMBER(to_char(CREATE_DATE,'YYYY')) = 2017) For the rows returned, executes (REGEXP_COUNT) per row
At SQL2 it filters by the result of (REGEXP_COUNT) which means that executes it against all table rows. Then, on that result, filters by (TO_NUMBER(to_char(CREATE_DATE,'YYYY')) = 2017)
To prove this, execute SQL1 without the filter. It will take approximately as much time as SQL2, maybe a little more.
To optimize you need to be 100% sure it will take SQL1 filter first. An absolute way would be to execute SQL1 and get the results into a temporary/memory table, then filter on them SQL2 filter
Upvotes: 0
Reputation: 1285
As pointed out from Martin the issue is the expensive regexp_count function. So reducing the question is:
Why is:
select * from (
with dat as (select level lv, rpad('X',500,'X') txt from dual connect by level <= 20000)
select lv,
REGEXP_COUNT(txt, '(ABC|DEF)([[:digit:]]){5}') as occurrences
from dat
--where REGEXP_COUNT(txt, '(ABC|DEF)([[:digit:]]){5}') > 1
) where rownum > 1
0.019 seconds and
select * from (
with dat as (select level lv, rpad('X',500,'X') txt from dual connect by level <= 20000)
select lv,
REGEXP_COUNT(txt, '(ABC|DEF)([[:digit:]]){5}') as occurrences
from dat
where REGEXP_COUNT(txt, '(ABC|DEF)([[:digit:]]){5}') > 1
) where rownum > 1
6.7 seconds. Oracle evaluates the regexp_count in both executions. So there must be a difference in the evaluation in the where part and in the select part.
Upvotes: 0
Reputation: 396
thinking again about the information I guess the two strategies are:
SQL 1:
TO_NUMBER(to_char(CREATE_DATE,'YYYY')) = 2017
REGEXP_COUNT(JOB_Description, '(ABC|DEF)([[:digit:]]){5}')
on the resulting rowsSQL 2:
REGEXP_COUNT(JOB_Description, '(ABC|DEF)([[:digit:]]){5}')
on all rowsTO_NUMBER(to_char(CREATE_DATE,'YYYY')) = 2017
Since regexp functions are very expensive in Oracle this could explain the difference in performance.
Version 2 could be optimized with hints - for example with MATERIALIZE, if you add a CTE.
Upvotes: 1