Stefan Ramson
Stefan Ramson

Reputation: 541

Empty match on end token due to greedyness

I want to parse the space around items of a comma-separated list (the indices are the interesting part for me). Imagine parsing the arguments of a function call.

I'm using the regex ^\s*|\s*,\s*|\s*,?\s*$ for this, which works as intended for most cases. In particular, the beginning and the end should match with empty matches iff there is no whitespace (and/or a comma for the end). E.g. foo has 2 matches, one at 0-0 and one at 3-3.

Unfortunately, non-empty matches in the back are also followed by an empty match right at the end. Consider the following example (at regex):

enter image description here enter image description here

Here, the fifth match (23-23) is unintended. I assume this match is found due to greedy nature of *. However, one cannot use the ? operator on the end token $ to make it non-greedy.

Is there a way to express my intended behavior (without the empty match at the end) using JavaScript regexes?

Edit: here are some examples (using _ instead of spaces for clarity)

Upvotes: 0

Views: 44

Answers (1)

Barmar
Barmar

Reputation: 782285

Add a negative lookahead to the middle alternative so it doesn't match at the end.

And put a negative lookbehind in the last alternative so you won't get two matches when there's whitespace at the end.

^\s*|\s*,\s*(?!\s+$)|(?<![\s,])\s*,?\s*$

DEMO

Upvotes: 1

Related Questions