Reputation: 365
I have the regex (?<=^|(?<=[^a-zA-Z0-9-_\.]))@([A-Za-z]+[A-Za-z0-9-_]+)(?!\w)
.
Given the string @first@nope @second@Hello @my-friend, email@ [email protected] @friend
, what can I do to exclude the strings @first
and @second
since they are not whole words on their own ?
In other words, exclude them since they are succeeded by @ .
Upvotes: 0
Views: 122
Reputation: 163207
Another option is to assert a whitespace boundary to the left, and assert no word char or @ sign to the right.
(?<!\S)@([A-Za-z]+[\w-]+)(?![@\w])
The pattern matches:
(?<!\S)
Negative lookbehind, assert not a non whitespace char to the left@
Match literally([A-Za-z]+[\w-]+)
Capture group1, match 1+ chars A-Za-z and then 1+ word chars or -
(?![@\w])
Negative lookahead, assert not @ or word char to the rightOr match a non word boundary \B
before the @ instead of a lookbehind.
\B@([A-Za-z]+[\w-]+)(?![@\w])
Upvotes: 1
Reputation: 626689
You can use
(?<![a-zA-Z0-9_.-])@(?=([A-Za-z]+[A-Za-z0-9_-]*))\1(?![@\w])
(?a)(?<![\w.-])@(?=([A-Za-z][\w-]*))\1(?![@\w])
See the regex demo. Details:
(?<![a-zA-Z0-9_.-])
- a negative lookbehind that matches a location that is not immediately preceded with ASCII digits, letters, _
, .
and -
@
- a @
char(?=([A-Za-z]+[A-Za-z0-9_-]*))
- a positive lookahead with a capturing group inside that captures one or more ASCII letters and then zero or more ASCII letters, digits, -
or _
chars\1
- the Group 1 value (backreferences are atomic, no backtracking is allowed through them)(?![@\w])
- a negative lookahead that fails the match if there is a word char (letter, digit or _
) or a @
char immediately to the right of the current location.Note I put hyphens at the end of the character classes, this is best practice.
The (?a)(?<![\w.-])@(?=([A-Za-z][\w-]*))\1(?![@\w])
alternative uses shorthand character classes and the (?a)
inline modifier (equivalent of re.ASCII
/ re.A
makes \w
only match ASCII chars (as in the original version). Remove (?a)
if you plan to match any Unicode digits/letters.
Upvotes: 2