Reputation: 6323
I'd like to match a three-part string. The first part consists of one or more a
characters, the second part consists of one or more b
characters, and the third part consists either of zero or more c
characters or zero or more C
characters, but not a mix of c
and C
.
As such, I wrote the following regular expression:
/a+b+(C*|c*)/
And immediately noticed that it fails to greedily match the trailing c
s in the following string:
aaaaabbcc
Wrapping the inner clauses of the or clause does not fix the unexpected behavior:
/a+b+((C*)|(c*))/
But interestingly both regular expressions match the following, where the C
characters match the first clause of the or:
aaaaabbCC
The following regular expression captures the semantics accurately, but I'd like to understand why the original regular expression behaves unexpectedly.
/a+b+(([Cc])\2*)?/
Upvotes: 2
Views: 153
Reputation: 288250
Your regex doesn't work because first it tries C*
, which matches the empty string, so it has satisfied the or clause. Then it won't try to check if c*
can match more characters.
Here's a regular expression which does match the string as intended:
/a+b+(C+|c+)?/
That is, if it finds a C
it will match as many more C
as possible, if it finds a c
it will match as many more c
as possible. But finding C
or c
is optional.
Upvotes: 5
Reputation: 14179
var input = "aaaaabbc";
// if you want to pick up c
console.log(/a+b+(c|C)*/.exec(input).pop());
Upvotes: 1