Rafal_PL
Rafal_PL

Reputation: 51

How to extract substring matching particular pattern from text?

I have text column named 'LOG_TEXT' and I need to extract substring matching the rule: Start from 'CID:', then can be space or not, then text I am interested in, it ends with either ',' or ' ' or end of string. How to build expression to do it using regexp_substr ?

examples of source text and required output:
"some trashes CID: T17654, rest of trashes" -> required output "T17654";
"some trashes CID: 2876543298 rest of trashes" -> required output "2876543298";
"some trashes CID:2876543298 rest of trashes" -> required output "2876543298";
"some trashes CID: A-876" -> required output "A-876"

Anyone can help?

Upvotes: 1

Views: 70

Answers (1)

user5683823
user5683823

Reputation:

with
  sample_data (log_text) as (
    select 'some trashes CID: T17654, rest of trashes'    from dual union all
    select 'some trashes CID: 2876543298 rest of trashes' from dual union all
    select 'some trashes CID:2876543298 rest of trashes'  from dual union all
    select 'some trashes CID: A-876'                      from dual
  )
select log_text, regexp_substr(log_text, 'CID: ?([^ ,]*)', 1, 1, null, 1) as result
from   sample_data
;

LOG_TEXT                                     RESULT         
-------------------------------------------- ---------------
some trashes CID: T17654, rest of trashes    T17654         
some trashes CID: 2876543298 rest of trashes 2876543298     
some trashes CID:2876543298 rest of trashes  2876543298     
some trashes CID: A-876                      A-876   

The WITH clause is for testing only.

The regular expression searches for CID: followed optionally by one space ?(if there is any number of spaces, replace the question mark with a star). Then, in parentheses, we have [^ ,]* which means the maximum number of consecutive characters other than space and comma. This thing in parentheses (technically a "capture group") is the substring you are after. To return this capture group, use the sixth argument to regexp_substr - it's the first capture group (the only one, and therefore the first), and 1 as the sixth argument means return the first capture group.

Upvotes: 1

Related Questions