Reputation: 1982
What is the default operator precedence in Oracle's regular expressions when they don't contain parentheses?
For example, given
H|ha+
would it be evaluated as H|h
and then concatenated to a
as in ((H|h)a)
, or would the H
be alternated with ha
as in (H|(ha))
?
Also, when does the +
kick in, etc.?
Upvotes: 30
Views: 37934
Reputation: 5886
Using capturing groups to demonstrate the order of evaluation, the regex H|ha+
is equivalent to the following:
(H|(h(a+)))
This is because the precedence rules (as seen below) are applied in order from the highest precedence (the lowest numbered) one to the lowest precedence (the highest numbered) one:
Rule 5 → (a+)
The +
is grouped with the a
because this operator works on the preceding single character, back-reference, group (a "marked sub-expression" in Oracle parlance), or bracket expression (character class).
Rule 6 → (h(a+))
The h
is then concatenated with the group in the preceding step.
Rule 8 → (H|(h(a+)))
The H
is then alternated with the group in the preceding step.
Precedence table from section 9.4.8 of the POSIX docs for regular expressions (there doesn't seem to be an official Oracle table):
+---+----------------------------------------------------------+
| | ERE Precedence (from high to low) |
+---+----------------------------------------------------------+
| 1 | Collation-related bracket symbols | [==] [::] [..] |
| 2 | Escaped characters | \<special character> |
| 3 | Bracket expression | [] |
| 4 | Grouping | () |
| 5 | Single-character-ERE duplication | * + ? {m,n} |
| 6 | Concatenation | |
| 7 | Anchoring | ^ $ |
| 8 | Alternation | | |
+---+-----------------------------------+----------------------+
The table above is for Extended Regular Expressions. For Basic Regular Expressions see 9.3.7.
Upvotes: 24
Reputation: 29471
Given the Oracle doc:
Table 4-2 lists the list of metacharacters supported for use in regular expressions passed to SQL regular expression functions and conditions. These metacharacters conform to the POSIX standard; any differences in behavior from the standard are noted in the "Description" column.
And taking a look at the |
value in that table:
The expression a|b matches character a or character b.
Plus taking a look at the POSIX doc:
Operator precedence The order of precedence for of operators is as follows:
Collation-related bracket symbols [==] [::] [..]
Escaped characters \
Character set (bracket expression) []
Grouping ()
Single-character-ERE duplication * + ? {m,n}
Concatenation
Anchoring ^$
Alternation |
I would say that H|ha+
would be the same as (?:H|ha+)
.
Upvotes: 17