Reputation: 1389
I have lines that contain email adresses and hidden variations of email adresses, for example, use of [at]
instead of @
. I would like clean this list from everything that is not a email adress.
The TLDs is .com
, .us
and .me
Sample input
[email protected]
johndoe @example.us
contant johndoe @ example . me
my email is [email protected]
[email protected] is my email
this johndoe @ example.com is my mail
johndoe[at]example.com
my email is johndoe [at] example.com
johndoe[at-sign]example.com
johndoe at example.com
johndoe[at-sign]example[dot]com is my mail
Lorem ipsum dolor sit amet, consectetur adipisicing elit, johndoe[at-sign]example[dot]us
johndoe[at-sign]example[dot]me labore et dolore magna aliqua
Sed do eiusmod tempor incididunt johndoe at example dot com
Duis aute irure dolor in reprehenderit in voluptate JOHNDOE at EXAMPLE dot US aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur
Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum johndoe at example dot me
I am using Notepad++ search and replace and my try is this [\w]+(|\s)(@|at|\[at\]|\[at-sign\])(|\s)[\w]+(|\s)(\.|dot)(|\s)(com|us|me)
and it seem to work on everything but not line 11, 12, 13 and 15.
Wrote this on my own and, is this the right way?
Desired output:
[email protected]
[email protected]
johndoe @ example . me
[email protected]
[email protected]
[email protected]
johndoe[at]example.com
johndoe [at] example.com
johndoe[at-sign]example.com
johndoe [at-sign] example.com
johndoe[at-sign]example[dot]com
johndoe[at-sign]example[dot]us
johndoe[at-sign]example[dot]me
johndoe at example dot com
JOHNDOE at EXAMPLE dot US
johndoe at exampledotme
I don't expect this to be 100% bulletproof since I have read that e-mail validation can be hard.
Upvotes: 1
Views: 1645
Reputation: 71538
You can simplify your regex a bit, and what's wrong with the one you're using is that you are not matching the square parentheses around the dot
:
\w+\s?(?:@|at|\[at(?:-sign)?\])\s?\w+\s?(?:\.|\[dot\]|dot)\s?(?:com|us|me)
^^^^^^^
Though if you want to remove everything else, you might use this:
^(?:.*?(\w+ ?(?:@|at|\[at(?:-sign)?\]) ?\w+ ?(?:\.|\[dot\]|dot) ?(?:com|us|me)).*|.*)$
And replace with $1
.
Upvotes: 1