Reputation: 3402
I have a simple pattern to match with Regex:
{tag:value=text}
tag
, value
and text
are the parts that i want to capture. The trick is that value
is optional (as is the literal ":" before)
here are a few samples:
{tag:value=text}
{tag=text}
{tag:=text}
The first line should have "tag" in the "tag" capture group, "value" in the "value" capture group and "text" in the Text capture group. The two other lines should not have any "value" capture group (or it could be empty)
I have tried variations around the following regex:
{(?<tag>.*):(?<value>.*)?=(?<text>.*)}
This works on sample 1 and 3 but not on the second one.
There can be any number of match in a given text and I want to grab them all.
edit: This is some sample of data I'm trying to match:
Progress: {progress:p1=10%}
Planned duration: {time=10m}
Actors output: {actor:actor1=<nothing to say>}, {actor:actor2=<nothing to say>}
Scene comments: {display=This is a sample scene}
Upvotes: 1
Views: 194
Reputation: 42709
Does this do the trick? It uses a non-greedy modifier ?
after the .*
. This causes it to match as few characters as possible, instead of as many as possible. Since the character following is a :
or a =
, it will stop before reaching them.
{(.*?)(?::(.*?))?=(.*?)}
https://regex101.com/r/fD2eR6/1
Edit: as pointed out below, you're looking for named captures.
{(?<tag>.*?)(?::(?<val>.*?))?=(?<text>.*?)}
Updated URL: https://regex101.com/r/fD2eR6/2
Upvotes: 4
Reputation: 89584
The source of your problems comes from the use of .*
that is too permissive (and that may cause a lot of backtracking when the pattern works). You can replace all of these .
with the appropriate negated character class:
{(?<tag>[^:=]*)(?::(?<value>[^=]*))?=(?<text>[^}]*)}
With a negated character class you can always use a greedy quantifier because that is the set of allowed characters that stops the quantifier and the regex engine doesn't have to test for each character if the next is a :
, a =
or a }
.
Upvotes: 4