MemoryLeak
MemoryLeak

Reputation: 7318

Oracle SQL performance optimization

I have two SQL statements whose performance I expect to be similar, but in fact SQL1 used 0.065 seconds and SQL2 used over 10 seconds with just 8000 records in total. Could anyone help to explain this? How can I optimize SQL2?

SQL 1:

select
    job_id,
    JOB_DESCRIPTION,
    REGEXP_COUNT(JOB_Description, '(ABC|DEF)([[:digit:]]){5}') as occurrences 
from smms.job 
where TO_NUMBER(to_char(CREATE_DATE,'YYYY')) = 2017;

SQL 2:

select job_id, JOB_Description 
from (
    select 
        job_id, 
        JOB_DESCRIPTION,
        REGEXP_COUNT(JOB_Description, '(ABC|DEF)([[:digit:]]){5}') as occurrences 
    from smms.job 
    where TO_NUMBER(to_char(CREATE_DATE,'YYYY')) = 2017
) 
where occurrences > 0;

Upvotes: 2

Views: 278

Answers (3)

Stefanos Zilellis
Stefanos Zilellis

Reputation: 611

At SQL1 it filters by (TO_NUMBER(to_char(CREATE_DATE,'YYYY')) = 2017) For the rows returned, executes (REGEXP_COUNT) per row

At SQL2 it filters by the result of (REGEXP_COUNT) which means that executes it against all table rows. Then, on that result, filters by (TO_NUMBER(to_char(CREATE_DATE,'YYYY')) = 2017)

To prove this, execute SQL1 without the filter. It will take approximately as much time as SQL2, maybe a little more.

To optimize you need to be 100% sure it will take SQL1 filter first. An absolute way would be to execute SQL1 and get the results into a temporary/memory table, then filter on them SQL2 filter

Upvotes: 0

Thomas Strub
Thomas Strub

Reputation: 1285

As pointed out from Martin the issue is the expensive regexp_count function. So reducing the question is:

Why is:

  select * from (
  with dat as (select level lv, rpad('X',500,'X') txt from dual connect by level <= 20000)
  select lv, 
         REGEXP_COUNT(txt, '(ABC|DEF)([[:digit:]]){5}') as occurrences 
  from   dat 
  --where  REGEXP_COUNT(txt, '(ABC|DEF)([[:digit:]]){5}') > 1
  ) where rownum > 1

0.019 seconds and

  select * from (
  with dat as (select level lv, rpad('X',500,'X') txt from dual connect by level <= 20000)
  select lv, 
         REGEXP_COUNT(txt, '(ABC|DEF)([[:digit:]]){5}') as occurrences 
  from   dat 
  where  REGEXP_COUNT(txt, '(ABC|DEF)([[:digit:]]){5}') > 1
  ) where rownum > 1

6.7 seconds. Oracle evaluates the regexp_count in both executions. So there must be a difference in the evaluation in the where part and in the select part.

Upvotes: 0

Martin Preiss
Martin Preiss

Reputation: 396

thinking again about the information I guess the two strategies are:

SQL 1:

  • Filter the rows with TO_NUMBER(to_char(CREATE_DATE,'YYYY')) = 2017
  • use the function REGEXP_COUNT(JOB_Description, '(ABC|DEF)([[:digit:]]){5}') on the resulting rows

SQL 2:

  • use the function REGEXP_COUNT(JOB_Description, '(ABC|DEF)([[:digit:]]){5}') on all rows
  • filter the result with TO_NUMBER(to_char(CREATE_DATE,'YYYY')) = 2017

Since regexp functions are very expensive in Oracle this could explain the difference in performance.

Version 2 could be optimized with hints - for example with MATERIALIZE, if you add a CTE.

Upvotes: 1

Related Questions