Reputation: 455
I want to capture a group in an optional part of a string.
For example:
In the string "firstName:Bill-lastName:Gates", I want to capture 2 groups :
I use this regex:
firstName:(.*)-lastName:(.*)
But when the lastName-part is optional, I still want to capture the first group (firstName).
I used this regex, to make the lastName-part optional (in a non-capturing group):
firstName:(.*)(?:-lastName:(.*))?
Using this updated regex, the resulting groups are:
when the lastName part is not present, for example "firstName:Bill" the captured groups are:
which is correct,
when the firstName and lastName parts are present: "firstName:Bill-lastName:Gates", the groups are not correct:
I think it has to do with greediness of the first capturing group, but how to adjust this regex to make the regex work when the lastName-part is optional?
Upvotes: 2
Views: 962
Reputation: 4649
Even though you accepted @dognose's answer already, I assure you there are first names with a dash in them (You don't wanna piss off Jean-Claude van Damme). I would advise you to do it like so:
firstName:((?:(?!-lastName:).)*)(?:-lastName:(.*))?
You can see from the visualization that the (?:(?!-lastName:).)
says "if the current position is not followed by '-lastName:', capture another character"
Upvotes: 2
Reputation: 20889
You are right, it is about greediness. Find a delimiter for the first match group. So, if your firstname "never" contains the dash, only match everything but the dash with the first match group.
firstName:([^-]*)(?:-lastName:(.*))?
firstName:([^-]*)(?:-lastName:(.*))?
If you cannot find such a delimiter you would need to take a different approach. Even if you try to make the first pattern "lazy", the Regex engine always prefers a bigger match over matching an additional optional match.
This is, because lazy matchgroups will match the first string that satisfies the expression (! important wording !)
There might be an option with look arrounds, but you could also use a or -statement without providing optional matches:
firstName:(.*)-lastName:(.*)|firstName:(.*)
This way, the regex engine would match either or, but prefer the pattern with 2 matches since it is listed first. Only if that does not apply, it will try the single match.
Upvotes: 4