Reputation: 41
I am using a query to get different rows instead of semicolon-seprated values.
The table looks like this:
row_id aggregator
1 12;45
2 25
Using the query I want the output to look like:
row_id aggregator
1 12
1 45
2 25
I am using the following query:
SELECT
DISTINCT ROW_ID,
REGEXP_SUBSTR(AGGREGATOR,'[^;]+',1,LEVEL) as AGGREGATOR
FROM DUMMY_1
CONNECT BY REGEXP_SUBSTR(AGGREGATOR,'[^;]+',1,LEVEL) IS NOT NULL;
which it is very slow even for 300 records, and I have to work with 40000 records.
Upvotes: 4
Views: 13381
Reputation: 3875
Your connect by
produces much more records than needed, that's why the performance is poor and you need to use distinct
to limit the number records. An approach that does need distinct
would be:
select row_id, regexp_substr(aggregator,'[^;]+',1,n) aggregator
from dummy_1, (select level n from dual connect by level < 100)
where n <= regexp_count(aggregator,';')+1
The above works if number of semicolons is less than 99. The below solution does not have this limitation and is faster if the maximal number of semicolons is lower:
with dummy_c as (select row_id, aggregator, regexp_count(aggregator,';')+1 c from dummy_1)
select row_id, regexp_substr(aggregator,'[^;]+',1,n) aggregator
from dummy_c, (select level n from dual connect by level <= (select max(c) from dummy_c))
where n <= c
Upvotes: 0
Reputation: 5782
I think the DISTINCT may the problem. Besides, I do not understand why do you need to CONNECT BY REGEXP_SUBSTR(AGGREGATOR,'[^;]+',1,LEVEL) IS NOT NULL. You are using regexp in your select and connect by. Can you use where AGGREGATOR IS NOT NULL instead of connect by? Find a way to get rid of distinct and revise your query. You can use EXISTS instead of distinct... To help you more I need tables and data.
SELECT * FROM
(
SELECT REGEXP_SUBSTR(AGGREGATOR ,'[^;]+',1,LEVEL) as AGGREGATOR
FROM your_table
)
WHERE AGGREGATOR IS NOT NULL
/
Upvotes: -1
Reputation: 67722
Regular expressions are known to be expensive functions, so you should try to minimize their use when performance is critical (such as using standard functions in the CONNECT BY
clause).
Using standard functions (INSTR
, SUBSTR
, REPLACE
) will be more efficient, but the resulting code will be hard to read/understand/maintain.
I could not resist writing a recursive QTE, which I is much more efficient than both regular expressions and standard functions. Furthermore, recursive QTE queries have arguably some elegance. You'll need Oracle 11.2:
WITH rec_sql(row_id, aggregator, lvl, tail) AS (
SELECT row_id,
nvl(substr(aggregator, 1, instr(aggregator, ';') - 1),
aggregator),
1 lvl,
CASE WHEN instr(aggregator, ';') > 0 THEN
substr(aggregator, instr(aggregator, ';') + 1)
END tail
FROM dummy_1 initialization
UNION ALL
SELECT r.row_id,
nvl(substr(tail, 1, instr(tail, ';') - 1), tail),
lvl + 1,
CASE WHEN instr(tail, ';') > 0 THEN
substr(tail, instr(tail, ';') + 1)
END tail
FROM rec_sql r
WHERE r.tail IS NOT NULL
)
SELECT * FROM rec_sql;
You can see on SQLFiddle that this solution is very performant and on par with @A.B.Cade's solution. (Thanks to A.B.Cade for the test case).
Upvotes: 5
Reputation: 16905
Sometimes a pipelined table can be faster, try this:
create or replace type t is object(word varchar2(100), pk number);
/
create or replace type t_tab as table of t;
/
create or replace function split_string(del in varchar2) return t_tab
pipelined is
word varchar2(4000);
str_t varchar2(4000) ;
v_del_i number;
iid number;
cursor c is
select * from DUMMY_1;
begin
for r in c loop
str_t := r.aggregator;
iid := r.row_id;
while str_t is not null loop
v_del_i := instr(str_t, del, 1, 1);
if v_del_i = 0 then
word := str_t;
str_t := '';
else
word := substr(str_t, 1, v_del_i - 1);
str_t := substr(str_t, v_del_i + 1);
end if;
pipe row(t(word, iid));
end loop;
end loop;
return;
end split_string;
And here is another demo with 22 rows containing 3 vals in aggregator each - see the difference between first and second query..
Upvotes: 4