Reputation: 4482
I'm trying to build a regular expression to meet these conditions:
[DON'T MATCH]
dont:[email protected]
[MATCH]
mailto:[email protected]
[email protected]
<p>[email protected]</p>
I can match the last two, but the first example (DON'T MATCH) is also matched.
How do I make sure an email is only valid if it's plain or proceeded by mailto:
, but not just a :
?
http://rubular.com/r/HvldBe4Ew9
Regex:
(?<=mailto:)?([a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+)
Upvotes: 0
Views: 859
Reputation: 98961
No need fora-zA-Z
, just use A-Z
and make the regex case insensitive with re.IGNORECASE
.
Also make sure you use
^
Assert position at the beginning of a line
and
$
Assert position at the end of a line
Python Example:
import re
match = re.search(r"^(?:mailto:)?([A-Z0-9_.+-]+@[A-Z0-9-]+\.[\tA-Z0-9-.]+)$", email, re.IGNORECASE)
if match:
result = match.group(1)
else:
result = ""
Demo:
https://regex101.com/r/cI1eD6/1
Regex explanation:
^(mailto:)?([A-Z0-9_.+-]+@[A-Z0-9-]+\.[A-Z0-9-.]+)$
Options: Case insensitive
Assert position at the beginning of a line «^»
Match the regex below and capture its match into backreference number 1 «(mailto:)?»
Between zero and one times, as many times as possible, giving back as needed (greedy) «?»
Match the character string “mailto:” literally «mailto:»
Match the regex below and capture its match into backreference number 2 «([A-Z0-9_.+-]+@[A-Z0-9-]+\.[A-Z0-9-.]+)»
Match a single character present in the list below «[A-Z0-9_.+-]+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
A character in the range between “A” and “Z” «A-Z»
A character in the range between “0” and “9” «0-9»
A single character from the list “_.+” «_.+»
The literal character “-” «-»
Match the character “@” literally «@»
Match a single character present in the list below «[A-Z0-9-]+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
A character in the range between “A” and “Z” «A-Z»
A character in the range between “0” and “9” «0-9»
The literal character “-” «-»
Match the character “.” literally «\.»
Match a single character present in the list below «[A-Z0-9-.]+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
A character in the range between “A” and “Z” «A-Z»
A character in the range between “0” and “9” «0-9»
A single character from the list “-.” «-.»
Assert position at the end of a line «$»
Upvotes: 0
Reputation: 627082
You can use anchors ^
and $
for matching string start/end if the strings are passed as separate values:
(?<=>)(?:mailto:)?([a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9.-]+)(?=<)
Or, getting rid of capturing groups:
(?<=>)(?:mailto:)?[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9.-]+(?=<)
See demo
Please note that you have an issue in [a-zA-Z0-9-.]
: the hyphen symbol should not appear unescaped in the middle of the character class.
Upvotes: 1