user10022403
user10022403

Reputation: 91

Oracle SQL REGEXP_REPLACE - Everything Except Specified String

I am having an issue with something that I thought would have been simple, but can't for the life of me get it sorted out. I'm trying to replace everything except a specified string with a blank space.

I am able to remove the string (see below), but am unable to 'inverse' the operation.

with text as (
select 'Cat dog sheep /* 67 = 123 + monkey12 abcd and then a fish B1234.CAT_DOG_MOUSE and half a loaf of bread /* ON INNER JOIN B3456.BIRD_SHOE 11' as jibberish)
    
select regexp_replace(jibberish, '(B)[[:digit:]]{4}[.][[:alnum:]_]+', ' ') as new_text from text;

'Cat dog sheep /* 67 = 123 + monkey12 abcd and then a fish   and half a loaf of bread /* ON INNER JOIN   11'

I have tried various things, such as '[^(B)[[:digit:]]{4}[.][[:alnum:]_]+]*' and various other things that I've found on here, but none of them seem to work. Is this is a quirk of using regex within Oracle?

Just to clarify, after running the code I would be wanting something that looked like this:

'              B1234.CAT_DOG_MOUSE                 B3456.BIRD_SHOE       '

Any help would be hugely appreciated.

Thanks!

Upvotes: 2

Views: 2199

Answers (2)

mik
mik

Reputation: 3895

The following query does what you want:

  with text
    as (select 'Cat dog sheep /* 67 = 123 + monkey12 abcd and then a fish B1234.CAT_DOG_MOUSE and half a loaf of bread /* ON INNER JOIN B3456.BIRD_SHOE 11' as jibberish
          from dual)
select regexp_replace(jibberish, '(B[[:digit:]]{4}[.][[:alnum:]_]+)|.', '\1 ') as new_text
  from text;

The result is:

                                                           B1234.CAT_DOG_MOUSE                                            B3456.BIRD_SHOE    

It works also for consecutive and trailing matches (but not for overlapping ones).

https://onecompiler.com/plsql/42dctmk4f

Upvotes: 0

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626861

You can use

regexp_replace(jibberish, '(B\d{4}\.\w+)?.', '\1 ')

See the regex demo.

Details

  • (B\d{4}\.\w+)? - an optional group matching B, then four digits, a dot and then one or more word chars
  • . - any one char.

The replacement is \1, the backreference to the value captured by the capturing group.

If there can be consecutive matches, add one more REGEXP_REPLACE call to append a space right after your expected matches as a workaround:

regexp_replace(regexp_replace(jibberish, '(B\d{4}\.\w+)', '\1 '), '(B\d{4}\.\w+)?.', '\1 ')

This might add a couple extract spaces, though, but all consecutive matches will be found.

Upvotes: 2

Related Questions