ewok
ewok

Reputation: 21443

perl: regex for matching after an optional character

I need to take a string that can have one of 4 formats:

  1. html
  2. text
  3. attachment
  4. email:[address]

I need a regular expression that will correctly capture 2 things: the $type, which is html, text, attachment, or email, and the $arg, which is [address] if $type is email, and undef otherwise. If $type is not email, then there should be no matches at all. I've written this regex:

m/(html|email|text|attachment):?(.*)/;

Which has the problem that it will match even if there is something trailing text, html, or attachment, and will also match if there is no :. So, for instance, [email protected] would give ("email", "[email protected]"). I also tried this one:

m/(html)|(email):(.*)|(text)|(attachment)/;

Which results in 5 groups. Is there a way to capture the way I want, so that I will get no matches if there is no colon after email, or if there IS a colon after something else?

Upvotes: 1

Views: 370

Answers (1)

Casimir et Hippolyte
Casimir et Hippolyte

Reputation: 89547

Yes, to do that you can use the branch reset feature: (?|...|...|...)

/(?|(html)|(email):(.*)|(text)|(attachment))/

In a branch reset, capture groups of each alternative have the same numbers.

To exclude, "html", "text", "attachment" followed by anything else (including a colon), you need a condition on the right (anchor, lookahead or other). Same thing for the beginning.

Upvotes: 3

Related Questions