Reputation: 21443
I need to take a string that can have one of 4 formats:
html
text
attachment
email:[address]
I need a regular expression that will correctly capture 2 things: the $type
, which is html
, text
, attachment
, or email
, and the $arg
, which is [address]
if $type
is email
, and undef
otherwise. If $type
is not email
, then there should be no matches at all. I've written this regex:
m/(html|email|text|attachment):?(.*)/;
Which has the problem that it will match even if there is something trailing text
, html
, or attachment
, and will also match if there is no :
. So, for instance, [email protected]
would give ("email", "[email protected]")
. I also tried this one:
m/(html)|(email):(.*)|(text)|(attachment)/;
Which results in 5 groups. Is there a way to capture the way I want, so that I will get no matches if there is no colon after email
, or if there IS a colon after something else?
Upvotes: 1
Views: 370
Reputation: 89547
Yes, to do that you can use the branch reset feature: (?|...|...|...)
/(?|(html)|(email):(.*)|(text)|(attachment))/
In a branch reset, capture groups of each alternative have the same numbers.
To exclude, "html", "text", "attachment" followed by anything else (including a colon), you need a condition on the right (anchor, lookahead or other). Same thing for the beginning.
Upvotes: 3