okoboko
okoboko

Reputation: 4482

regex for plain email and mailto links, but not http basic auth

I'm trying to build a regular expression to meet these conditions:

[DON'T MATCH]

dont:[email protected]

[MATCH]

mailto:[email protected]
[email protected]
<p>[email protected]</p>

I can match the last two, but the first example (DON'T MATCH) is also matched.

How do I make sure an email is only valid if it's plain or proceeded by mailto:, but not just a :?

http://rubular.com/r/HvldBe4Ew9

Regex:

(?<=mailto:)?([a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+)

Upvotes: 0

Views: 859

Answers (2)

Pedro Lobito
Pedro Lobito

Reputation: 98961

No need fora-zA-Z, just use A-Z and make the regex case insensitive with re.IGNORECASE.
Also make sure you use

^ Assert position at the beginning of a line
and
$ Assert position at the end of a line


Python Example:

import re

match = re.search(r"^(?:mailto:)?([A-Z0-9_.+-]+@[A-Z0-9-]+\.[\tA-Z0-9-.]+)$", email, re.IGNORECASE)
if match:
    result = match.group(1)
else:
    result = ""

Demo:

https://regex101.com/r/cI1eD6/1


Regex explanation:

^(mailto:)?([A-Z0-9_.+-]+@[A-Z0-9-]+\.[A-Z0-9-.]+)$

Options: Case insensitive

Assert position at the beginning of a line «^»
Match the regex below and capture its match into backreference number 1 «(mailto:)?»
   Between zero and one times, as many times as possible, giving back as needed (greedy) «?»
   Match the character string “mailto:” literally «mailto:»
Match the regex below and capture its match into backreference number 2 «([A-Z0-9_.+-]+@[A-Z0-9-]+\.[A-Z0-9-.]+)»
   Match a single character present in the list below «[A-Z0-9_.+-]+»
      Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
      A character in the range between “A” and “Z” «A-Z»
      A character in the range between “0” and “9” «0-9»
      A single character from the list “_.+” «_.+»
      The literal character “-” «-»
   Match the character “@” literally «@»
   Match a single character present in the list below «[A-Z0-9-]+»
      Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
      A character in the range between “A” and “Z” «A-Z»
      A character in the range between “0” and “9” «0-9»
      The literal character “-” «-»
   Match the character “.” literally «\.»
   Match a single character present in the list below «[A-Z0-9-.]+»
      Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
      A character in the range between “A” and “Z” «A-Z»
      A character in the range between “0” and “9” «0-9»
      A single character from the list “-.” «-.»
Assert position at the end of a line «$»

Upvotes: 0

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627082

You can use anchors ^ and $ for matching string start/end if the strings are passed as separate values:

(?<=>)(?:mailto:)?([a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9.-]+)(?=<)

Or, getting rid of capturing groups:

(?<=>)(?:mailto:)?[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9.-]+(?=<)

See demo

Please note that you have an issue in [a-zA-Z0-9-.]: the hyphen symbol should not appear unescaped in the middle of the character class.

Upvotes: 1

Related Questions