Reputation: 4117
I have a RegEx here and I need to know if it will 100% omit any bad email addresses but I do not understand them fully so need to call on the community experts.
The string is as follows:
^[_a-zA-Z0-9-]+(.[_a-zA-Z0-9-]+)*@[a-zA-Z0-9-]+(.[a-zA-Z0-9-]+)*(.[a-zA-Z]{2,3})$
Thank you in advance!
Upvotes: 0
Views: 3853
Reputation: 29
To all the writers above that identify that the .
accepts any character, I have found that in writing a response to another RegEx question, this edit-capture widget eats backslashes.
(IT'S A PROBLEM!)
Ok... Let's write it correctly:
^\s*([_a-zA-Z0-9]+(\\.[_a-zA-Z0-9\\-\\%]+)\*)@([a-zA-Z0-9]+(\\.[a-zA-Z0-9\\-]+)\*(\\.[a-zA-Z]{2,4}))\s*$
This also incorporates the %
character as an allowed-inside value. The problem with this routine is that while it actually does a pretty good job parsing email addresses, it also is not very efficient, since RegEx is "greedy" and the terminating condition (which is supposed to match things like .com
and .edu
) will overshoot, then need to backtrack, costing considerable CPU time.
The real answer is to use the routines that are specific to this, as other posters have recommended. But if you don't have the CPAN modules, or the target environment does not, then the RegEx hack is arguably acceptable.
Upvotes: -2
Reputation: 14328
perhaps this regular expression will do?
^[_A-Za-z0-9-\+]+(\.[_A-Za-z0-9-]+)*@[A-Za-z0-9-]+(\.[A-Za-z0-9]+)*(\.[A-Za-z]{2,})$
taken from
http://www.mkyong.com/regular-expressions/how-to-validate-email-address-with-regular-expression/
Upvotes: 0
Reputation: 6553
Please, please, don't try to validate email addresses using regular expressions; this is a wheel that does not need re-inventing, and unless you write a horrendously hairy regular expression, you will let through invalid email addresses or reject valid ones.
There are plenty of modules on CPAN like Email::Valid which will take care of it all for you and are tried-and-tested.
Simple example:
use Email::Valid;
print (Email::Valid->address('[email protected]') ? 'yes' : 'no');
Much simpler, and will just work.
Alternatively, using Mail::RFC822::Address:
if (Mail::RFC822::Address::valid('[email protected]')) { ...}
For an example of how hairy a regular expression would have to be to successfully handle all RFC822-compliant addresses, take a look at this beauty.
People who try to hand-roll their own email address validation tend to end up with code that lets syntactically-invalid addresses slip through, and perhaps worse, reject perfectly valid addresses.
For example, some people use +
in their address, like [email protected]
- this is known as an "address tag" or "sub-addressing". Quite a few naive attempts at validation would refuse that, and the customer will end up going elsewhere.
Also, in the past some people used to assume the TLD would always be 2 or 3 characters; when e.g. .info
was launched, people with addresses at those domains would be told their perfectly-valid email address wasn't acceptable.
Finally, there are some pathological cases such as "Mickey Mouse"@example.com
, bob@[1.2.3.4]
which are syntactically-valid, but most people's hand-rolled validation would refuse.
Upvotes: 16
Reputation: 111860
^[_a-zA-Z0-9-]+(.[_a-zA-Z0-9-]+)*@[a-zA-Z0-9-]+(.[a-zA-Z0-9-]+)*(.[a-zA-Z]{2,3})$
Piece by piece
^ Start of the string
[_a-zA-Z0-9-]+ One or more characters of "_" (no quotes), a letter (a-z, A-Z), a number (0-9), or "-" (no quotes)
(.[_a-zA-Z0-9-]+)* zero or more substrings of type .something, or .123, or .a123. The substring must be formed by a . and a letter (same group of letters as before). So "." is not valid. ".a" or ".1" or ".-" is.
(up until now it will accept for example my.name12
or my.name12.surname34
)
@ a "@" (like max@something)
[a-zA-Z0-9-]+ One or more characters with the same pattern as before
(.[a-zA-Z0-9-]+)* Zero or more substrings of type ".something"... just as before
(.[a-zA-Z]{2,3}) A "." (dot) and 2 or 3 letters (a-z or A-Z)
$ The end of the string
So we have an email address, where you can't have [email protected]
(no "dangling" dot before the @
) or [email protected]
(no beginning dot). The domain must start with a letter and can't have a dot just before the first level domain (.com
/.uk
/??), so no [email protected]
. The first-level domain must have 2 or 3 letters (no numbers)
There is an error, the .
(dot) must be escaped, so it should be \.
. Depending on the language, the \
must be escaped in a string (so it could be \\.
)
Upvotes: 8
Reputation: 46187
No, it will not exclude 100% of bad email addresses. Short of rejecting all addresses, this is impossible for a regex to accomplish because the vast majority of syntactically-valid addresses are for accounts which do not exist, such as [email protected]
.
The only way to truly verify the legitimacy of an email address is to attempt to send mail to it - and even that will only tell you that mail is accepted at that address, not that it is received by a human (as opposed to being fed to a script or silently discarded) and, even if it is received by a human, you have no guarantee that it's the human who claimed to own it. ("You insist that I have to give you a deliverable email address? Fine. My email address is [email protected]
.")
Upvotes: 2
Reputation: 46
[_a-zA-Z0-9-]
Means you only want these characters (any alphanumeric char or '-' or '_') in your email address but it can be valid with all these characters : ! # $ % & ' * + - / = ? ^ _ ` { | } ~
The first part (before @) must be 253 characters long at most ({1,253}) and the second part (after @) can be 64 characters long max ({4,64}). (Add parenthesis to the first or second group before putting the ({4,64}) count limit)
If you want to know the EmailAddress Norm, just look wikipedia : The Article On Wiki
Upvotes: 2
Reputation: 1009
Upvotes: 2
Reputation: 41
Simple answer: it won't.
Next to the fact that a bad email address doesn't necessarily imply it's wrongly formatted ([email protected] is rightly formatted but is still bad), the RegEx will accept some bad addresses as well.
For example, the most right-hand part ((.[a-zA-Z]{2,3})$
) states the verified string should end with a dot and then two or three letters. This will accept non-existing top level domain names (e.g. .aa) and will block four-letter TLD's (e.g. .info)
Upvotes: 4
Reputation: 174309
If I see it correctly, the following would be valid according to your regex: a@a@a@a@aa
The dot is the sign for any character!
Additionally, the following valid email address would not be accepted, although it should:
Someone%[email protected]
Upvotes: 6