happytobe
happytobe

Reputation: 21

Need to extract text with REGEXP_SUBSTR - cannot find the right combination

Here is an example of my text- I am trying to get TEXTPART3 AS THE ANSWER:

TEXTPART1 : TEXTPART2: TEXTPART3 - TEXTPART4

I used TRIM(LEADING ':' FROM REGEXP_SUBSTR('textstatementhere', ':.+?-')) - but it is not accounting for the two ":" and the "-" in the text statement I get ' TEXTPART2: TEXTPART3 -'

Can anyone help?

Thanks in advance!

Upvotes: 0

Views: 309

Answers (2)

user5683823
user5683823

Reputation:

The problem has a more efficient solution using only standard string functions:

with
  sample_input (str) as (
    select 'TEXTPART1 : TEXTPART2: TEXTPART3 - TEXTPART4' from dual
  )
select substr(str, pos, instr(str, '-', pos) - pos - 1) as text_part_3
from   (select str, instr(str, ':', 1, 2) + 2 as pos from sample_input)
;


TEXT_PART_3
-----------
TEXTPART3

Upvotes: 1

Patrick Bacon
Patrick Bacon

Reputation: 4660

Leverage the REGEXP_SUBSTR Function to Do All of Your Work

Use a subexpression, (\w), and reference it:

WITH exmple AS (
    SELECT
        'TEXTPART1 : TEXTPART2: TEXTPART3 - TEXTPART4' txt
    FROM
        dual
)
SELECT
    txt,
    regexp_substr(txt, ': (\w*) -', 1, 1, NULL,
                  1)
FROM
    exmple;

I see that you used ., in lieu of \w. Because you chose the meta-character,. (which represents all characters except new line (though that can be included if "n" is set as a pattern matching modifier)), the second colon is thrown in to the matching set.

What does TEXTPART3 include?

Perhaps the meta-character, \w, (which stands for alphanumeric or underscore (_) character), is not what you need.

You could replace it with a non-matching character list to avoid the problems with .:

[^:].

This approach would look like this:

REGEXP_SUBSTR(txt, ': ([^:]*) -',1,1,NULL,1)

Lastly, with the quantiifier associated with this subexpression, I used * which means zero or more matches. I assumed that there would be instances where there could be zero matches for this TEXTPART3. If this is not the case, we can use +.

Upvotes: 0

Related Questions