Reputation: 91
I am having an issue with something that I thought would have been simple, but can't for the life of me get it sorted out. I'm trying to replace everything except a specified string with a blank space.
I am able to remove the string (see below), but am unable to 'inverse' the operation.
with text as (
select 'Cat dog sheep /* 67 = 123 + monkey12 abcd and then a fish B1234.CAT_DOG_MOUSE and half a loaf of bread /* ON INNER JOIN B3456.BIRD_SHOE 11' as jibberish)
select regexp_replace(jibberish, '(B)[[:digit:]]{4}[.][[:alnum:]_]+', ' ') as new_text from text;
'Cat dog sheep /* 67 = 123 + monkey12 abcd and then a fish and half a loaf of bread /* ON INNER JOIN 11'
I have tried various things, such as '[^(B)[[:digit:]]{4}[.][[:alnum:]_]+]*' and various other things that I've found on here, but none of them seem to work. Is this is a quirk of using regex within Oracle?
Just to clarify, after running the code I would be wanting something that looked like this:
' B1234.CAT_DOG_MOUSE B3456.BIRD_SHOE '
Any help would be hugely appreciated.
Thanks!
Upvotes: 2
Views: 2199
Reputation: 3895
The following query does what you want:
with text
as (select 'Cat dog sheep /* 67 = 123 + monkey12 abcd and then a fish B1234.CAT_DOG_MOUSE and half a loaf of bread /* ON INNER JOIN B3456.BIRD_SHOE 11' as jibberish
from dual)
select regexp_replace(jibberish, '(B[[:digit:]]{4}[.][[:alnum:]_]+)|.', '\1 ') as new_text
from text;
The result is:
B1234.CAT_DOG_MOUSE B3456.BIRD_SHOE
It works also for consecutive and trailing matches (but not for overlapping ones).
https://onecompiler.com/plsql/42dctmk4f
Upvotes: 0
Reputation: 626861
You can use
regexp_replace(jibberish, '(B\d{4}\.\w+)?.', '\1 ')
See the regex demo.
Details
(B\d{4}\.\w+)?
- an optional group matching B
, then four digits, a dot and then one or more word chars.
- any one char.The replacement is \1
, the backreference to the value captured by the capturing group.
If there can be consecutive matches, add one more REGEXP_REPLACE
call to append a space right after your expected matches as a workaround:
regexp_replace(regexp_replace(jibberish, '(B\d{4}\.\w+)', '\1 '), '(B\d{4}\.\w+)?.', '\1 ')
This might add a couple extract spaces, though, but all consecutive matches will be found.
Upvotes: 2