Tobias B
Tobias B

Reputation: 1

Extract string between different special symbols

I am having following string in my query

.\ABC\ABC\2021\02\24\ABC__123_123_123_ABC123.txt

beginning with a period from which I need to extract the segment between the final \ and the file extension period, meaning following expected result

ABC__123_123_123_ABC123

Am fairly new to using REGEXP and couldn't help myself to an elegant (or workable) solution with what Q&A here or else. In all queries the pattern is the same in quantity and order but for my growth of knowledge I'd prefer to not just count and cut.

Upvotes: 0

Views: 449

Answers (3)

Manifest Man
Manifest Man

Reputation: 905

Here is my simple full compatible example with Oracle 11g R2, PCRE2 and some other languages.

Oracle 11g R2 using function substr (Reference documentation)

 select
 regexp_substr(
   '.\ABC\ABC\2021\02\24\ABC__123_123_123_ABC123.txt',
   '((\w)+(_){2}(((\d){3}(_)){3}){1}((\w)+(\d)+){1}){1}',
   1, 
   1
 ) substring 
from dual;

Pattern: ((\w)+(_){2}(((\d){3}(_)){3}){1}((\w)+(\d)+){1}){1}

Result: ABC__123_123_123_ABC123

Just as simple as it can be, regular expressions always follow a minimal standard, as you can see portability also provided, just for the case someone else is interested in going the simplest way.

Hopefully, this will help you out!

Upvotes: 0

Sayan Malakshinov
Sayan Malakshinov

Reputation: 8655

You need just regexp_substr and simple regexp ([^\]+)\.[^.]*$

select
 regexp_substr(
   '.\ABC\ABC\2021\02\24\ABC__123_123_123_ABC123.txt',
   '([^\]+)\.[^.]*$',
   1, -- position
   1, -- occurence
   null, -- match_parameter 
   1  -- subexpr 
) substring 
from dual;

([^\]+)\.[^.]*$ means:

  • ([^\]+) - find one or more(+) any characters except slash([] - set, ^ - negative, ie except) and name it as group \1(subexpression #1)
  • \. - then simple dot (. is a special character which means any character, so we need to "escape" it using \ which is an escape character)
  • [^.]* - zero or more any characters except .
  • $ - end of line

So this regexp means: find a substring which consist from: one or more any characters except slash followed by dot followed by zero or more any characters except dot and it should be in the end of string. And subexpr parameter = 1, says oracle to return first subexpression (ie first matched group in (...))

Other parameters you can find in the doc.

Upvotes: 1

Barbaros Özhan
Barbaros Özhan

Reputation: 65288

You can use REGEXP_REPLACE function such as

REGEXP_REPLACE(col,'(.*\\)(.*)\.(.*)','\2')

in order to extract the piece starting from the last slash upto the dot. Preceding slashes in \\ and \. are used as escape characters to distinguish the special characters and our intended \ and . characters.

Demo

Upvotes: 1

Related Questions