Batu.Khan
Batu.Khan

Reputation: 3065

How to get all substring occurences between some characters?

What i'm trying to get is the part of a column text that is between some characters ($$ to be exact) but the trick is those characters can occur more than twice (but always even like if there are more than 2 than it must be like $$xxx$$ ... $$yyy$$) and I need to get them separately.

When I try this, if the pattern only occur once then it's no problem :

regexp_substr(txt,'\$\$(.*)\$\$',1,1,null,1)

But lets say the column text is : $$xxx$$ ... $$yyy$$

then it gives me : xxx$$ ... $$yyy

but what I need is two get them in separate lines like :

xxx
yyy

which I couldn't get it done so how?

Upvotes: 0

Views: 57

Answers (2)

Gary_W
Gary_W

Reputation: 10360

One could also use CONNECT BY to "loop" through the elements surrounded by the double dollar signs, returning the data inside (the 2nd grouping). This method handles NULL elements (ID 7, element 2) and since the dollar signs are consumed as the regex moves from left to right, characters in between the groups are not falsely matched.

SQL> with tbl(id, txt) as (
     select 1, '$$xxx$$' from dual union all
     select 2, '$$xxx$$ ... $$yyy$$' from dual union all
     select 3, '' from dual union all
     select 4, '$$xxx$$abc$$yyy$$' from dual union all
     select 5, '$$xxx$$ ... $$yyy$$ ... $$www$$ ... $$zzz$$' from dual union all
     select 6, '$$aaa$$$$bbb$$$$ccc$$$$ddd$$' from dual union all
     select 7, '$$aaa$$$$$$$$ccc$$$$ddd$$' from dual
   )
   select id, level, regexp_substr(txt,'(\$\$(.*?)\$\$)',1,level,null,2) element
   from tbl
   connect by regexp_substr(txt,'(\$\$(.*?)\$\$)',1,level) is not null
     and prior txt = txt
     and prior sys_guid() is not null
   order by id, level;

        ID      LEVEL ELEMENT
---------- ---------- -------------------------------------------
         1          1 xxx
         2          1 xxx
         2          2 yyy
         3          1
         4          1 xxx
         4          2 yyy
         5          1 xxx
         5          2 yyy
         5          3 www
         5          4 zzz
         6          1 aaa
         6          2 bbb
         6          3 ccc
         6          4 ddd
         7          1 aaa
         7          2
         7          3 ccc
         7          4 ddd

18 rows selected.

SQL>

Upvotes: 1

trincot
trincot

Reputation: 350147

You could use a recursive query that matches the first occurrence and then removes that from the string for the next iteration of the recursive query.

Assuming your table and column are called tbl and txt:

with cte(match, txt) as (
    select regexp_substr(txt,'\$\$(.*?)\$\$', 1, 1, null, 1),
           regexp_replace(txt,'\$\$(.*?)\$\$', '', 1, 1)
    from   tbl
    where  regexp_like(txt,'\$\$(.*?)\$\$') 
    union all
    select regexp_substr(txt,'\$\$(.*?)\$\$', 1, 1, null, 1),
           regexp_replace(txt,'\$\$(.*?)\$\$', '', 1, 1)
    from   cte
    where  regexp_like(txt,'\$\$(.*?)\$\$') 
)
select match from cte

Upvotes: 3

Related Questions