domenu
domenu

Reputation: 455

Captured group in optional part of a regular expression

I want to capture a group in an optional part of a string.

For example:

In the string "firstName:Bill-lastName:Gates", I want to capture 2 groups :

  1. Bill
  2. Gates

I use this regex:

firstName:(.*)-lastName:(.*)

But when the lastName-part is optional, I still want to capture the first group (firstName).

I used this regex, to make the lastName-part optional (in a non-capturing group):

firstName:(.*)(?:-lastName:(.*))?

Using this updated regex, the resulting groups are:

which is correct,

I think it has to do with greediness of the first capturing group, but how to adjust this regex to make the regex work when the lastName-part is optional?

Upvotes: 2

Views: 962

Answers (2)

asontu
asontu

Reputation: 4649

Even though you accepted @dognose's answer already, I assure you there are first names with a dash in them (You don't wanna piss off Jean-Claude van Damme). I would advise you to do it like so:

    firstName:((?:(?!-lastName:).)*)(?:-lastName:(.*))?

Regular expression visualization

Debuggex Demo

You can see from the visualization that the (?:(?!-lastName:).) says "if the current position is not followed by '-lastName:', capture another character"

Upvotes: 2

dognose
dognose

Reputation: 20889

You are right, it is about greediness. Find a delimiter for the first match group. So, if your firstname "never" contains the dash, only match everything but the dash with the first match group.

firstName:([^-]*)(?:-lastName:(.*))?

firstName:([^-]*)(?:-lastName:(.*))?

Regular expression visualization

Debuggex Demo

If you cannot find such a delimiter you would need to take a different approach. Even if you try to make the first pattern "lazy", the Regex engine always prefers a bigger match over matching an additional optional match.

This is, because lazy matchgroups will match the first string that satisfies the expression (! important wording !)

There might be an option with look arrounds, but you could also use a or -statement without providing optional matches:

firstName:(.*)-lastName:(.*)|firstName:(.*)

This way, the regex engine would match either or, but prefer the pattern with 2 matches since it is listed first. Only if that does not apply, it will try the single match.

Upvotes: 4

Related Questions