Reputation: 61
I have a requirement to identify a particular string in a text using below conditions:
*1. Any string containing whitespace before and after it OR
Any string containing a . (dot) as prefix and whitespace as suffix OR
Any string having whitespace as prefix and , (comma) as suffix*
Once found, i need to replace it with another string without replacing the prefix and suffix as mentioned above. And this needs to be done in a pl/sql code in Oracle (preferably using regexp_replace function).
Example:
Text : 'This, is a sample_text, which_needs_.to_be_replaced as per, the matching.criteria.defined above,'
Replace string: 'replaced'
Output: 'This, replaced replaced replaced, which_needs_.replaced replaced replaced, replaced matching.criteria.replaced replaced'
I know this is a weird example, but the actual requirement is even more weird than this. Please guide me how to achieve this.
Thank you in advance.
Upvotes: 0
Views: 1914
Reputation: 9875
Rather than trying to write one giant regex, it may be easier to split the string into rows for each token. You can do this by adapting any of the CSV-to-rows methods around. e.g.:
with rws as (
select 'This, is a sample_text, which_needs_.to_be_replaced as per, the matching.criteria.defined above,' str from dual
), vals as (
select regexp_substr(str,'[A-z_,]+(\.|\s)?', 1, level) str, level l
from rws
connect by regexp_substr(str, '[^, .]+', 1, level) is not null
)
select * from vals;
STR L
This, 1
is 2
a 3
sample_text, 4
which_needs_. 5
to_be_replaced 6
as 7
per, 8
the 9
matching. 10
criteria. 11
defined 12
above, 13
Now replace each of these according to your rules. You're only dealing with one token at a time, so it's easy to see which you're replacing correctly. This makes the regex easier to write and debug:
with rws as (
select 'This, is a sample_text, which_needs_.to_be_replaced as per, the matching.criteria.defined above,' str from dual
), vals as (
select regexp_substr(str,'[A-z_,]+(\.|\s)?', 1, level) str, level l
from rws
connect by regexp_substr(str, '[^, .]+', 1, level) is not null
)
select case
when l = 1 then str
when substr ( str, -1, 1 ) = '.' then
str
else
regexp_replace (
str,
'^[A-z_]+',
'replaced'
)
end replaced, l
from vals;
REPLACED L
This, 1
replaced 2
replaced 3
replaced, 4
which_needs_. 5
replaced 6
replaced 7
replaced, 8
replaced 9
matching. 10
criteria. 11
replaced 12
replaced, 13
Then you listagg
the values back together to get the final string:
with rws as (
select 'This, is a sample_text, which_needs_.to_be_replaced as per, the matching.criteria.defined above,' str from dual
), vals as (
select regexp_substr(str,'[A-z_,]+(\.|\s)?', 1, level) str, level l
from rws
connect by regexp_substr(str, '[^, .]+', 1, level) is not null
), replaces as (
select case
when l = 1 then str
when substr ( str, -1, 1 ) = '.' then
str
else
regexp_replace (
str,
'[A-z_]+',
'replaced'
)
end replaced, l
from vals
)
select listagg ( replaced )
within group ( order by l ) s
from replaces;
S
This, replaced replaced replaced, which_needs_.replaced replaced replaced, replaced matching.criteria.replaced replaced,
Ensure you test thoroughly! In my experience, you find more exceptions/refinements when you have complex rules like this. So it's likely you'll have to tinker with the replacement rules in the case expression.
Upvotes: 2