Reputation: 1703
Here is my regular expression:
"button:not([DISABLED])".match(/\([^()]+\)|[^()]+/g);
The result is:
["button:not", "([DISABLED])"]
Is it correct? I'm confused. Because the (pipe) operator |
means "or", I think the correct result is:
["button:not", "[DISABLED]", "([DISABLED])"]
Because this:
["button:not", "[DISABLED]"]
is the result of:
"button:not([DISABLED])".match(/[^()]+/g);
and this:
["([DISABLED])"]
is the result of:
"button:not([DISABLED])".match(/\([^()]+\)/g);
But the result output in console tell me the result is:
["button:not", "([DISABLED])"]
Where is the problem?
Upvotes: 38
Views: 60792
Reputation: 135762
The regex
/\([^()]+\)|[^()]+/g
Basically says: There are two options, match (1) \([^()]+\)
OR (2) [^()]+
, wherever you see any of them (/g
).
Let's iterate at your sample string so you understand the reason behind the obtained result.
Starting string:
button:not([DISABLED])
Steps:
b
(actually it begins at the start-of-string anchor, ^
, but for this example it is irrelevant).b
can only match the (2), as the (1) requires a starting (
.
(
or )
.t
char (because the next char is a (
which does not match [^()]+
) thus leaving button:not
as first matched string.(
. Does it begin to match any of the options? Yes, the first one: \([^()]+\)
.
(
or )
until it finds a )
(if while consuming it finds a (
before a )
, it will backtrack as that will mean the (1) regex was ultimately not matched).)
, leaving then ([DISABLED])
as second matched string.Edit: There's a very useful online tool that allows you to see the regex in a graphical form. Maybe it helps to understand how the regex will work:
You can also move the cursor step by step and see what I tried to explain above: live link.
Note about the precedence of expressions separed by |
: Due to the way the JavaScript regex engine process the strings, the order in which the expressions appear matter. It will evaluate each alternative in the order they are given. If one is those options is matched to the end, it will not attempt to match any other option, even if it could. Hopefully an example makes it clearer:
"aaa".match(/a|aa|aaa/g); // ==> ["a", "a", "a"]
"aaa".match(/aa|aaa|a/g); // ==> ["aa", "a"]
"aaa".match(/aaa|a|aa/g); // ==> ["aaa"]
Upvotes: 62
Reputation: 33163
Regex finds the best match, not all possible matches. The best match for that regex is "([DISABLED])"
, not "[DISABLED]"
which is a subset of the "better" match.
Consider the following example:
"123 456789".match( /[0-9]{4,6}/g )
You want to find the one number that is between 4 and 6 digits long. If the result would be all possible numbers that match the regex, it wouldn't be of much use:
[ "4567", "5678", "6789", "45678", "56789", "456789" ] // you don't want this
Upvotes: -1
Reputation: 816462
Your understanding of the alternation operator seems to be incorrect. It does not look for all possible matches, only for the first one that matches (from left to right).
Consider (a | b)
as "match either a
or b
".
See also: http://www.regular-expressions.info/alternation.html
Upvotes: 15
Reputation: 98816
I’m not very good on regular expressions, but I think they work by giving you one thing that matches them, rather than all things that could match them.
So, the |
operator says: “give me something that matches the left regular expression, or something that matches the right regular expression”.
As your string contains something that matches the left regular expression, you just get that.
Upvotes: 0