Reputation: 4088
The expression:
Reassigning definition: (\d+) from: \[(\d+)\] to: \[(\d+)\].+?\.(?: Target definition = (\d+))?.*
correctly produces the following matches:
Group 1. 24-30 494801
Group 2. 38-45 8280955
Group 3. 52-59 8336297
Group 4. 103-109 494767
for the input string:
Reassigning definition: 494801 from: [8280955] to: [8336297], advancing due dates. Target definition = 494767.
and the first 3 matches for the input string:
Reassigning definition: 494801 from: [8280955] to: [8336297], advancing due dates.
with JavaScript, Python, PHP, and GoLang flavors (see https://regex101.com/r/Br66wm/3), but does not with SQL regexp-substr:
with
input_string as
(
select 'Reassigning definition: 494801 from: [8280955] to: [8336297], advancing dates. Target definition = 494767.' as test_string from dual
union all
select 'Reassigning definition: 494801 from: [8280955] to: [8336297], advancing dates.' as test_string from dual
),
pattern_string as
(
select 'Reassigning definition: (\d+) from: \[(\d+)\] to: \[(\d+)\].+?\.(?: Target definition = (\d+))?.*$' as pattern_string from dual
)
select
regexp_substr(i.test_string, p.pattern_string, 1, 1, null, 1) as group_1,
regexp_substr(i.test_string, p.pattern_string, 1, 1, null, 2) as group_2,
regexp_substr(i.test_string, p.pattern_string, 1, 1, null, 3) as group_3,
regexp_substr(i.test_string, p.pattern_string, 1, 1, null, 4) as group_4
from
input_string i, pattern_string p;
where the 4th group is always null
. What's wrong with my usage of non-capturing groups? Basically, the following sentence is optional in my input test strings:
Target definition = 494767.
Upvotes: 0
Views: 2169
Reputation: 4088
Because the POSIX-based regex implementation does not seem to support non-capturing groups, and the captured groups of regex_substr
are not readily available as separate columns, I went with the following, which basically uses a different regex for the optional group.
with
input_string as
(
select 'Reassigning definition: 494801 from: [8280955] to: [8336297], advancing dates. Target definition = 494767.' as test_string from dual
union all
select 'Reassigning definition: 494767 from: [8336297] to: [8369944], advancing dates.' as test_string from dual
),
pattern_string as
(
select 'Reassigning definition: (\d+) from: \[(\d+)\] to: \[(\d+)\]' as pattern_string from dual
)
select
regexp_substr(i.test_string, p.pattern_string, 1, 1, null, 1) as group_1,
regexp_substr(i.test_string, p.pattern_string, 1, 1, null, 2) as group_2,
regexp_substr(i.test_string, p.pattern_string, 1, 1, null, 3) as group_3,
regexp_substr(i.test_string, 'Target definition = (\d+)', 1, 1, null, 1) as group_4
from
input_string i, pattern_string p;
Upvotes: 0
Reputation: 142798
This is a little bit too much to put into a comment so I'll write it here. If it doesn't make sense, I'll just remove it.
If you're always looking for digits in those strings (not related to what surrounds them), then it could be simplified to
SQL> with
2 input_string as
3 (
4 select 'Reassigning definition: 494801 from: [8280955] to: [8336297], advancing dates. Target definition = 494767.' as test_string from dual
5 union all
6 select 'Reassigning definition: 494801 from: [8280955] to: [8336297], advancing dates.' as test_string from dual
7 )
8 select regexp_substr(test_string, '\d+', 1, 1) grp1,
9 regexp_substr(test_string, '\d+', 1, 2) grp2,
10 regexp_substr(test_string, '\d+', 1, 3) grp3,
11 regexp_substr(test_string, '\d+', 1, 4) grp4
12 from input_string;
GRP1 GRP2 GRP3 GRP4
---------- ---------- ---------- ----------
494801 8280955 8336297 494767
494801 8280955 8336297
SQL>
Or, option which doesn't have fixed number of groups (though, layout is different from what you wanted):
SQL> with
2 input_string as
3 (
4 select 'Reassigning definition: 494801 from: [8280955] to: [8336297], advancing dates. Target definition = 494767.' as test_string from dual
5 union all
6 select 'Reassigning definition: 494801 from: [8280955] to: [8336297], advancing dates.' as test_string from dual
7 )
8 select column_value grp_rn,
9 regexp_substr(test_string, '\d+', 1, column_value) grp
10 from input_String cross join
11 table(cast(multiset(select level from dual
12 connect by level <= regexp_count(test_string, '\d+')
13 ) as sys.odcinumberlist));
GRP_RN GRP
------- ----------
1 494801
2 8280955
3 8336297
4 494767
1 494801
2 8280955
3 8336297
7 rows selected.
Upvotes: 1