user2114275
user2114275

Reputation: 41

oracle query slow with REGEXP_SUBSTR(AGGREGATOR,'[^;]+',1,LEVEL)

I am using a query to get different rows instead of semicolon-seprated values.

The table looks like this:

row_id  aggregator
1       12;45
2       25

Using the query I want the output to look like:

row_id  aggregator
1       12
1       45
2       25

I am using the following query:

SELECT
DISTINCT ROW_ID,
REGEXP_SUBSTR(AGGREGATOR,'[^;]+',1,LEVEL) as AGGREGATOR
FROM DUMMY_1
CONNECT BY REGEXP_SUBSTR(AGGREGATOR,'[^;]+',1,LEVEL) IS NOT NULL;

which it is very slow even for 300 records, and I have to work with 40000 records.

Upvotes: 4

Views: 13381

Answers (4)

mik
mik

Reputation: 3875

Your connect by produces much more records than needed, that's why the performance is poor and you need to use distinct to limit the number records. An approach that does need distinct would be:

select row_id, regexp_substr(aggregator,'[^;]+',1,n) aggregator
  from dummy_1, (select level n from dual connect by level < 100)
 where n <= regexp_count(aggregator,';')+1

The above works if number of semicolons is less than 99. The below solution does not have this limitation and is faster if the maximal number of semicolons is lower:

with dummy_c as (select row_id, aggregator, regexp_count(aggregator,';')+1 c from dummy_1)
select row_id, regexp_substr(aggregator,'[^;]+',1,n) aggregator
  from dummy_c, (select level n from dual connect by level <= (select max(c) from dummy_c))
 where n <= c

Upvotes: 0

Art
Art

Reputation: 5782

I think the DISTINCT may the problem. Besides, I do not understand why do you need to CONNECT BY REGEXP_SUBSTR(AGGREGATOR,'[^;]+',1,LEVEL) IS NOT NULL. You are using regexp in your select and connect by. Can you use where AGGREGATOR IS NOT NULL instead of connect by? Find a way to get rid of distinct and revise your query. You can use EXISTS instead of distinct... To help you more I need tables and data.

SELECT * FROM
(
 SELECT REGEXP_SUBSTR(AGGREGATOR ,'[^;]+',1,LEVEL) as AGGREGATOR                      
   FROM your_table
)
WHERE AGGREGATOR IS NOT NULL
/

Upvotes: -1

Vincent Malgrat
Vincent Malgrat

Reputation: 67722

Regular expressions are known to be expensive functions, so you should try to minimize their use when performance is critical (such as using standard functions in the CONNECT BY clause).

Using standard functions (INSTR, SUBSTR, REPLACE) will be more efficient, but the resulting code will be hard to read/understand/maintain.

I could not resist writing a recursive QTE, which I is much more efficient than both regular expressions and standard functions. Furthermore, recursive QTE queries have arguably some elegance. You'll need Oracle 11.2:

WITH rec_sql(row_id, aggregator, lvl, tail) AS (
SELECT row_id, 
       nvl(substr(aggregator, 1, instr(aggregator, ';') - 1), 
           aggregator),
       1 lvl,
       CASE WHEN instr(aggregator, ';') > 0 THEN
          substr(aggregator, instr(aggregator, ';') + 1)
       END tail
  FROM dummy_1 initialization
UNION ALL
SELECT r.row_id, 
       nvl(substr(tail, 1, instr(tail, ';') - 1), tail), 
       lvl + 1, 
       CASE WHEN instr(tail, ';') > 0 THEN
          substr(tail, instr(tail, ';') + 1)
       END tail
  FROM rec_sql r
 WHERE r.tail IS NOT NULL
)
SELECT * FROM rec_sql;

You can see on SQLFiddle that this solution is very performant and on par with @A.B.Cade's solution. (Thanks to A.B.Cade for the test case).

Upvotes: 5

A.B.Cade
A.B.Cade

Reputation: 16905

Sometimes a pipelined table can be faster, try this:

create or replace type t is object(word varchar2(100), pk number);
/
create or replace type t_tab as table of t;
/

create or replace function split_string(del in varchar2) return t_tab
  pipelined is

  word    varchar2(4000);
  str_t   varchar2(4000) ;
  v_del_i number;
  iid     number;

  cursor c is
    select * from DUMMY_1; 

begin

  for r in c loop
    str_t := r.aggregator;
    iid   := r.row_id;

    while str_t is not null loop

      v_del_i := instr(str_t, del, 1, 1);

      if v_del_i = 0 then
        word  := str_t;
        str_t := '';
      else
        word  := substr(str_t, 1, v_del_i - 1);
        str_t := substr(str_t, v_del_i + 1);
      end if;

      pipe row(t(word, iid));

    end loop;

  end loop;

  return;
end split_string;

Here is a sqlfiddle demo

And here is another demo with 22 rows containing 3 vals in aggregator each - see the difference between first and second query..

Upvotes: 4

Related Questions