Mohammed Arif
Mohammed Arif

Reputation: 61

Match a pattern using some condition and replace it with some string using regexp_replace in Oracle

I have a requirement to identify a particular string in a text using below conditions:

*1. Any string containing whitespace before and after it OR

  1. Any string containing a . (dot) as prefix and whitespace as suffix OR

  2. Any string having whitespace as prefix and , (comma) as suffix*

Once found, i need to replace it with another string without replacing the prefix and suffix as mentioned above. And this needs to be done in a pl/sql code in Oracle (preferably using regexp_replace function).

Example:

Text : 'This, is a sample_text, which_needs_.to_be_replaced as per, the matching.criteria.defined above,'

Replace string: 'replaced'

Output: 'This, replaced replaced replaced, which_needs_.replaced replaced replaced, replaced matching.criteria.replaced replaced'

I know this is a weird example, but the actual requirement is even more weird than this. Please guide me how to achieve this.

Thank you in advance.

Upvotes: 0

Views: 1914

Answers (1)

Chris Saxon
Chris Saxon

Reputation: 9875

Rather than trying to write one giant regex, it may be easier to split the string into rows for each token. You can do this by adapting any of the CSV-to-rows methods around. e.g.:

with rws as (
  select 'This, is a sample_text, which_needs_.to_be_replaced as per, the matching.criteria.defined above,' str from dual
), vals as (
  select regexp_substr(str,'[A-z_,]+(\.|\s)?', 1, level) str, level l
  from   rws
  connect by regexp_substr(str, '[^, .]+', 1, level) is not null
)
  select * from vals;

STR               L    
This,                  1 
is                     2 
a                      3 
sample_text,           4 
which_needs_.          5 
to_be_replaced         6 
as                     7 
per,                   8 
the                    9 
matching.             10 
criteria.             11 
defined               12 
above,                13

Now replace each of these according to your rules. You're only dealing with one token at a time, so it's easy to see which you're replacing correctly. This makes the regex easier to write and debug:

with rws as (
  select 'This, is a sample_text, which_needs_.to_be_replaced as per, the matching.criteria.defined above,' str from dual
), vals as (
  select regexp_substr(str,'[A-z_,]+(\.|\s)?', 1, level) str, level l
  from   rws
  connect by regexp_substr(str, '[^, .]+', 1, level) is not null
)
  select case 
           when l = 1 then str
           when substr ( str, -1, 1 ) = '.' then
             str
           else 
           regexp_replace (
             str,
             '^[A-z_]+',
             'replaced'
           )
       end replaced, l
  from   vals;

REPLACED        L    
This,                1 
replaced             2 
replaced             3 
replaced,            4 
which_needs_.        5 
replaced             6 
replaced             7 
replaced,            8 
replaced             9 
matching.           10 
criteria.           11 
replaced            12 
replaced,           13

Then you listagg the values back together to get the final string:

with rws as (
  select 'This, is a sample_text, which_needs_.to_be_replaced as per, the matching.criteria.defined above,' str from dual
), vals as (
  select regexp_substr(str,'[A-z_,]+(\.|\s)?', 1, level) str, level l
  from   rws
  connect by regexp_substr(str, '[^, .]+', 1, level) is not null
), replaces as (
  select case 
           when l = 1 then str
           when substr ( str, -1, 1 ) = '.' then
             str
           else 
           regexp_replace (
             str,
             '[A-z_]+',
             'replaced'
           )
       end replaced, l
  from   vals
)
  select listagg ( replaced ) 
           within group ( order by l ) s
  from   replaces;

S                                                                                                                          
This, replaced replaced replaced, which_needs_.replaced replaced replaced, replaced matching.criteria.replaced replaced,

Ensure you test thoroughly! In my experience, you find more exceptions/refinements when you have complex rules like this. So it's likely you'll have to tinker with the replacement rules in the case expression.

Upvotes: 2

Related Questions