Reputation: 395
I am struggling with the regex replacement solution that would remove all the text that are between quotes from VARCHAR2 field even if the text between these quotes has quoted text as well For example text:
'text start 'text inside' text end' leftover 'some other text'
after regex replacement should contain: leftover
What I have came up with is this code:
with tbl as (
select
'''text start ''text inside'' text end'' leftover ''some other text''' as str
,'\''(.*?)\''' as regex
from dual
)
select
tbl.str as strA
,regexp_replace(tbl.str,tbl.regex, '') as strB
from tbl;
but the text between subquotes still remains.
Is it even possible to achieve this with regular expressions, or should I split and analyze the contents in some loop ? An ideal solution would be if it could handle infinite levels occurrences of quoted text inside quoted text.
Upvotes: 0
Views: 1488
Reputation: 23767
An ideal solution would be if it could handle infinite levels occurrences of quoted text inside quoted text.
It's impossible with a single regular expression.
Neither recursive regexps, nor recursive capture buffers are available in Oracle.
UPD :
But it could be done by SQL:
with tbl as (
select
'''text start ''text inside'' text end'' leftover ''some other text'''
as str
from dual
)
select
listagg(text) within group (order by n)
from
(
select
n,
sum(decode(regexp_replace(str, '^(.*?([<>])){'||n||'}.*$', '\2'),
'<', 1, '>', -1, 0)) over (order by n) as nest,
regexp_replace(str, '^(.*?[<>]){'||n||'}([^<>]*).*$', '\2') as text
from
( select regexp_replace(regexp_replace(str, '(\s|^)''', '\1<'),
'''(\s|$)', '>\1') as str from tbl ),
( select level-1 as n from dual
connect by level-1 <= (select regexp_count(str, '''') from tbl) )
)
where nest = 0
Upvotes: 1
Reputation: 17278
try
, '^[^'']*(''.*'')[^'']*$' as regex
caveat: this will dumbly capture all content between the first and the last occurrence of single quotes inside tested text in capture group 1, including the outermost quotes themselves. in particular it does not check for proper nesting.
more important your replacement expr will be more complex:
, CASE WHEN REGEXP_INSTR(test, regex) > 0
THEN REPLACE ( test, REGEXP_REPLACE(test, regex, '\1'), '' )
ELSE test
END
if the regexp matches, the capture group is extracted first to be used in an ordinary replacement (this works because the matched portion is guaranteed to be maximal).
IMPORTANT: the solution won't produce the desired result in the particular context you have supplied. however, you cannot fare any better with plsql regexp
functions since the oracle regex engine does not offer extensions to express recursion in the pattern (as eg. pcre do). you need this facility to resolve nesting constructs (ie. perform balanced counting).
Upvotes: 1