skeggse
skeggse

Reputation: 6323

Javascript greedy regex appears non-greedy

I'd like to match a three-part string. The first part consists of one or more a characters, the second part consists of one or more b characters, and the third part consists either of zero or more c characters or zero or more C characters, but not a mix of c and C.

As such, I wrote the following regular expression:

/a+b+(C*|c*)/

And immediately noticed that it fails to greedily match the trailing cs in the following string:

aaaaabbcc

Wrapping the inner clauses of the or clause does not fix the unexpected behavior:

/a+b+((C*)|(c*))/

But interestingly both regular expressions match the following, where the C characters match the first clause of the or:

aaaaabbCC

The following regular expression captures the semantics accurately, but I'd like to understand why the original regular expression behaves unexpectedly.

/a+b+(([Cc])\2*)?/

Upvotes: 2

Views: 153

Answers (3)

Oriol
Oriol

Reputation: 288250

Your regex doesn't work because first it tries C*, which matches the empty string, so it has satisfied the or clause. Then it won't try to check if c* can match more characters.

Here's a regular expression which does match the string as intended:

/a+b+(C+|c+)?/

That is, if it finds a C it will match as many more C as possible, if it finds a c it will match as many more c as possible. But finding C or c is optional.

Upvotes: 5

Hitmands
Hitmands

Reputation: 14179

var input = "aaaaabbc";

// if you want to pick up c
console.log(/a+b+(c|C)*/.exec(input).pop());

Upvotes: 1

csabinho
csabinho

Reputation: 1609

You have to place the * outside of the bracket!

Upvotes: 2

Related Questions