jayz
jayz

Reputation: 401

Create a pattern to capture with a text which does not have letters or numbers behind or infront of them

I am using REGEXP_REPLACE to replace a character set which does not have a number or character infront or behind them. Basically my reqirement is to remove the word PT, but only when it is not used in the middle of a word that has numbers and letters.

'Research Thesis PT' --> 'Research Thesis'

'Research ThesisPT' --> 'Research ThesisPT'

I found that oracle does not support negative lookahead and lookbehind. Is there a way to acieve this one ?

Upvotes: 0

Views: 51

Answers (2)

MT0
MT0

Reputation: 167774

You can:

  1. Double up the non-alphanumeric characters (which allows you to find consecutive matches without needing to use zero-width look-ahead or look-behind);
  2. Replace PT when it is surrounded by non-alphanumeric characters or the start and/or end of the string.
  3. Remove the second instance of each doubled up non-alphanumeric character.

Like this:

SELECT id,
       str,
       REGEXP_REPLACE(
         REGEXP_REPLACE(
           REGEXP_REPLACE(str, '([^[:alnum:]])', '\1\1'),
           '([^[:alnum:]]|^)PT([^[:alnum:]]|$)',
           '\1\2'
         ),
         '([^[:alnum:]])\1',
         '\1'
       ) AS result
FROM   data;

Which, for the sample data (credit to this answer for the initial data):

CREATE TABLE data(id, str) as
  select 1, 'PTResearch Thesis PT'       from dual union all
  select 2, 'ResePTarch ThesisPT PT bla' from dual union all
  select 3, 'Research Thesis PT'         from dual union all
  select 4, 'Research ThesisPT'          from dual union all
  select 5, 'PT Research Thesis PT'      from dual union all
  select 6, 'PT PT,PT-PTx!PT? PT'        from dual;

Outputs:

ID STR RESULT
1 PTResearch Thesis PT PTResearch Thesis
2 ResePTarch ThesisPT PT bla ResePTarch ThesisPT bla
3 Research Thesis PT Research Thesis
4 Research ThesisPT Research ThesisPT
5 PT Research Thesis PT Research Thesis
6 PT PT,PT-PTx!PT? PT ,-PTx!?

fiddle

Upvotes: 0

p3consulting
p3consulting

Reputation: 4567

As starting point (add more tests):

with data(id, str) as (
    select 1, 'PTResearch Thesis PT' from dual union all
    select 2, 'ResePTarch ThesisPT PT bla' from dual union all
    select 3, 'Research Thesis PT' from dual union all
    select 4, 'Research ThesisPT' from dual union all
    select 5, 'PT Research Thesis PT' from dual -- union all
)
select id, str, listagg(case when word <> 'PT' then word || sep end, '') within group(order by lvl) as res
from (
    select id, str, level as lvl, regexp_substr(str, '([[:alnum:]]+)', 1, level) as word,
        regexp_substr(str, '([^[:alnum:]]+)', 1, level) as sep
    from data 
    connect by prior id = id and regexp_substr(str, '([[:alnum:]]+)', 1, level) is not null
        and prior sys_guid() is not null
)
group by id, str

;

Upvotes: 0

Related Questions