mystarrocks
mystarrocks

Reputation: 4088

Oracle SQL regexp_substr non-capturing/optional group

The expression:

Reassigning definition: (\d+) from: \[(\d+)\] to: \[(\d+)\].+?\.(?: Target definition = (\d+))?.*

correctly produces the following matches:

Group 1.    24-30   494801
Group 2.    38-45   8280955
Group 3.    52-59   8336297
Group 4.    103-109 494767

for the input string:

Reassigning definition: 494801 from: [8280955] to: [8336297], advancing due dates. Target definition = 494767.

and the first 3 matches for the input string:

Reassigning definition: 494801 from: [8280955] to: [8336297], advancing due dates.

with JavaScript, Python, PHP, and GoLang flavors (see https://regex101.com/r/Br66wm/3), but does not with SQL regexp-substr:

with
  input_string as
  (
    select 'Reassigning definition: 494801 from: [8280955] to: [8336297], advancing dates. Target definition = 494767.' as test_string from dual
    union all
    select 'Reassigning definition: 494801 from: [8280955] to: [8336297], advancing dates.' as test_string from dual
   ),
   pattern_string as
   (
     select 'Reassigning definition: (\d+) from: \[(\d+)\] to: \[(\d+)\].+?\.(?: Target definition = (\d+))?.*$' as pattern_string from dual
   )
select
  regexp_substr(i.test_string, p.pattern_string, 1, 1, null, 1) as group_1,
  regexp_substr(i.test_string, p.pattern_string, 1, 1, null, 2) as group_2,
  regexp_substr(i.test_string, p.pattern_string, 1, 1, null, 3) as group_3,
  regexp_substr(i.test_string, p.pattern_string, 1, 1, null, 4) as group_4
from
  input_string i, pattern_string p;

where the 4th group is always null. What's wrong with my usage of non-capturing groups? Basically, the following sentence is optional in my input test strings:

 Target definition = 494767.

Upvotes: 0

Views: 2169

Answers (2)

mystarrocks
mystarrocks

Reputation: 4088

Because the POSIX-based regex implementation does not seem to support non-capturing groups, and the captured groups of regex_substr are not readily available as separate columns, I went with the following, which basically uses a different regex for the optional group.

with
  input_string as
  (
    select 'Reassigning definition: 494801 from: [8280955] to: [8336297], advancing dates. Target definition = 494767.' as test_string from dual
    union all
    select 'Reassigning definition: 494767 from: [8336297] to: [8369944], advancing dates.' as test_string from dual
   ),
   pattern_string as
   (
     select 'Reassigning definition: (\d+) from: \[(\d+)\] to: \[(\d+)\]' as pattern_string from dual
   )
select
  regexp_substr(i.test_string, p.pattern_string, 1, 1, null, 1) as group_1,
  regexp_substr(i.test_string, p.pattern_string, 1, 1, null, 2) as group_2,
  regexp_substr(i.test_string, p.pattern_string, 1, 1, null, 3) as group_3,
  regexp_substr(i.test_string, 'Target definition = (\d+)', 1, 1, null, 1) as group_4
from
  input_string i, pattern_string p;

Upvotes: 0

Littlefoot
Littlefoot

Reputation: 142798

This is a little bit too much to put into a comment so I'll write it here. If it doesn't make sense, I'll just remove it.

If you're always looking for digits in those strings (not related to what surrounds them), then it could be simplified to

SQL> with
  2    input_string as
  3    (
  4      select 'Reassigning definition: 494801 from: [8280955] to: [8336297], advancing dates. Target definition = 494767.' as test_string from dual
  5      union all
  6      select 'Reassigning definition: 494801 from: [8280955] to: [8336297], advancing dates.' as test_string from dual
  7     )
  8  select regexp_substr(test_string, '\d+', 1, 1) grp1,
  9         regexp_substr(test_string, '\d+', 1, 2) grp2,
 10         regexp_substr(test_string, '\d+', 1, 3) grp3,
 11         regexp_substr(test_string, '\d+', 1, 4) grp4
 12  from input_string;

GRP1       GRP2       GRP3       GRP4
---------- ---------- ---------- ----------
494801     8280955    8336297    494767
494801     8280955    8336297

SQL>

Or, option which doesn't have fixed number of groups (though, layout is different from what you wanted):

SQL> with
  2    input_string as
  3    (
  4      select 'Reassigning definition: 494801 from: [8280955] to: [8336297], advancing dates. Target definition = 494767.' as test_string from dual
  5      union all
  6      select 'Reassigning definition: 494801 from: [8280955] to: [8336297], advancing dates.' as test_string from dual
  7     )
  8  select column_value grp_rn,
  9         regexp_substr(test_string, '\d+', 1, column_value) grp
 10  from input_String cross join
 11    table(cast(multiset(select level from dual
 12                        connect by level <= regexp_count(test_string, '\d+')
 13                       ) as sys.odcinumberlist));

 GRP_RN GRP
------- ----------
      1 494801
      2 8280955
      3 8336297
      4 494767
      1 494801
      2 8280955
      3 8336297

7 rows selected.

Upvotes: 1

Related Questions