Reputation: 2996
I have a very specific set of rules that I need to use to validate email addresses. I've tried the Apache Commons library as well as the JavaMail library; although both of those adhere to RFC 2822, some emails that are invalid according to my rules get through. I have been trying my luck with regexes (regexi?) to no avail. I know, I know. A regex isn't the best option and can take a lot of time and add complications. Still, I figured since I have rules outlined in not so difficult terms that building one for this specific instance will suffice.
So far I have been trying to use the following regex:
^((?!.\.{2,}.)[^.][-a-zA-Z0-9_.!@#$%^&*(),'+=`{|}~-]+[^.])@((?!.\-{2,}.)[^-_][-a-zA-Z0-9_.]+[^-_]\.[a-zA-z]+)$
This is still failing with invalid emails (e.g. [email protected]).
What am I missing or doing wrong with the regex? Is there another way I can ensure an email conforms to the requirements without a regex?
Thanks in advance!
P.S. This is in Java, so all escaped characters in above regex have to be double escaped (e.g. \.
is \\.
). I have also been using Regexper to help me visualize this since I am obviously no regex guru.
Upvotes: 0
Views: 538
Reputation: 55609
I suggest:
Split on the @
symbol. Split on the last period (using String#substring
and String#lastIndexOf
). Now you have the local part, the domain and the TLD all in separate strings, use if-statements to validate. If there are any rules applicable to all (2 consecutive periods?), do that before splitting. Much simpler to get right, much simpler to understand, much simpler to maintain.
But, if you really want to stick to regex, here's a few things I've seen:
The [^.]
before the @
should be (?<!\.)
, otherwise the last character before the @
can be just about anything.
.
is just one character, so (?!.\-{2,}.)
and (?!.\.{2,}.)
does not do what you think it does. Just making it .*
seems to fix it. And you don't need to check any characters after the things you're looking for.
It hasn't been explicitly stated, but I presume the domain and TLD can't contain 2 successive periods either. If this is allowed, the first part of the regex needs to be (?!.*\.{2,}.*@)
to stop at the @
.
If you use String#matches
, the ^
and $
isn't required.
There's some unneeded ()
's.
Final regex:
(?!.*\.{2,})[^.][-a-zA-Z0-9_.!@#$%^&*(),'+=`{|}~-]+(?<!\.)@(?!.*\-{2,})[^-_][-a-zA-Z0-9_.]+[^-_]\.[a-zA-z]+
If you choose to stick to regex, I suggest extensive commenting:
String regex =
"(?!.*\\.{2,})" // doesn't contain 2 consecutive .'s
// local part
+ "[^.]" // doesn't start with a .
+ "[-a-zA-Z0-9_.!@#$%^&*(),'+=`{|}~-]+" // valid chars for local part
+ "(?<!\\.)" // last char of local part isn't a .
// at symbol
+ "@"
// domain
...
It might seem like overkill, but you'll wish you had if you try to maintain it a few months down the line, especially if you haven't touched any regex in those months.
Upvotes: 2
Reputation: 17707
The common wisdom is that e-mails are too complex for single regex. It is easier to check an e-mail address by seeing if an SMTP server can send it. You have already been told that.
So, assuming you need to pre-validate an address (and assuming it is only the email portion, and not all the goodies you can have like unicode names, etc.) then my recommendation would be:
This is the only realistic way to leave a somewhat reasonable system that's maintainable and understandable by the poor sucker who sees the code next time.
e.g.
private void validateNamePart(String npart) {
if (!npart.matches("")) {
throw new .....;
}
}
private void validateName(String name) {
int parts = 0;
for (String npart : name.split("\\.")) {
validateNamePart(npart);
parts++;
}
if (parts == 0) {
throw ....;
}
}
private void validateDomainPart(String dpart) {
....
}
private void validateDomain(String domain) {
....
}
public void validateEMail(String email) {
String parts = email.split("@");
if (parts.length == 2) {
validateName(parts[0]);
validateDomain(parts[1]);
} else {
throw ....
}
}
Upvotes: 1