user2155362
user2155362

Reputation: 1703

Why doesn’t the alternation (pipe) operator ( | ) in JavaScript regular expressions give me two matches?

Here is my regular expression:

"button:not([DISABLED])".match(/\([^()]+\)|[^()]+/g);

The result is:

["button:not", "([DISABLED])"]

Is it correct? I'm confused. Because the (pipe) operator | means "or", I think the correct result is:

["button:not", "[DISABLED]", "([DISABLED])"] 

Because this:

["button:not", "[DISABLED]"]

is the result of:

"button:not([DISABLED])".match(/[^()]+/g);

and this:

["([DISABLED])"]

is the result of:

"button:not([DISABLED])".match(/\([^()]+\)/g);

But the result output in console tell me the result is:

["button:not", "([DISABLED])"]

Where is the problem?

Upvotes: 38

Views: 60792

Answers (4)

acdcjunior
acdcjunior

Reputation: 135762

The regex

/\([^()]+\)|[^()]+/g

Basically says: There are two options, match (1) \([^()]+\) OR (2) [^()]+, wherever you see any of them (/g).

Let's iterate at your sample string so you understand the reason behind the obtained result.

Starting string:

button:not([DISABLED])

Steps:

  • The cursor begins at the char b (actually it begins at the start-of-string anchor, ^, but for this example it is irrelevant).
  • Between the two available options, b can only match the (2), as the (1) requires a starting (.
    • Now that it has begun to match the (2), it will keep on matching it all the way, meaning it will consume everything that's not a ( or ).
    • From the item above, it consumes everything up to (and including) the t char (because the next char is a ( which does not match [^()]+) thus leaving button:not as first matched string.
  • (room for clarity)
  • Now the cursor is at (. Does it begin to match any of the options? Yes, the first one: \([^()]+\).
    • Again, now that it has begun to match the (1), it will go through it all the way, meaning it will consume everything that's not a ( or ) until it finds a ) (if while consuming it finds a ( before a ), it will backtrack as that will mean the (1) regex was ultimately not matched).
    • Now it keeps consuming all the remaining characters until it finds ), leaving then ([DISABLED]) as second matched string.
  • (room for clarity)
  • Since we have reached the last character, the regex processing ends.



Edit: There's a very useful online tool that allows you to see the regex in a graphical form. Maybe it helps to understand how the regex will work:

Regular expression image

You can also move the cursor step by step and see what I tried to explain above: live link.

Note about the precedence of expressions separed by |: Due to the way the JavaScript regex engine process the strings, the order in which the expressions appear matter. It will evaluate each alternative in the order they are given. If one is those options is matched to the end, it will not attempt to match any other option, even if it could. Hopefully an example makes it clearer:

"aaa".match(/a|aa|aaa/g); // ==> ["a", "a", "a"]
"aaa".match(/aa|aaa|a/g); // ==> ["aa", "a"]
"aaa".match(/aaa|a|aa/g); // ==> ["aaa"]

Upvotes: 62

JJJ
JJJ

Reputation: 33163

Regex finds the best match, not all possible matches. The best match for that regex is "([DISABLED])", not "[DISABLED]" which is a subset of the "better" match.

Consider the following example:

"123 456789".match( /[0-9]{4,6}/g )

You want to find the one number that is between 4 and 6 digits long. If the result would be all possible numbers that match the regex, it wouldn't be of much use:

[ "4567", "5678", "6789", "45678", "56789", "456789" ]   // you don't want this

Upvotes: -1

Felix Kling
Felix Kling

Reputation: 816462

Your understanding of the alternation operator seems to be incorrect. It does not look for all possible matches, only for the first one that matches (from left to right).

Consider (a | b) as "match either a or b".

See also: http://www.regular-expressions.info/alternation.html

Upvotes: 15

Paul D. Waite
Paul D. Waite

Reputation: 98816

I’m not very good on regular expressions, but I think they work by giving you one thing that matches them, rather than all things that could match them.

So, the | operator says: “give me something that matches the left regular expression, or something that matches the right regular expression”.

As your string contains something that matches the left regular expression, you just get that.

Upvotes: 0

Related Questions