starscream_disco_party
starscream_disco_party

Reputation: 2996

Validate email (without sending confirmation)

Validation emails aren't an option :(

I have a very specific set of rules that I need to use to validate email addresses. I've tried the Apache Commons library as well as the JavaMail library; although both of those adhere to RFC 2822, some emails that are invalid according to my rules get through. I have been trying my luck with regexes (regexi?) to no avail. I know, I know. A regex isn't the best option and can take a lot of time and add complications. Still, I figured since I have rules outlined in not so difficult terms that building one for this specific instance will suffice.

The Rules:

  1. Local part of email address may use any of:
    • Upper- and lowercase letters
    • Digits 0-9
    • Special characters: , ! # $ % ^ & * ( ) ' ` + = - _ { } | ~
    • A period, but cannot start or end with a period
    • May not contain successive periods
  2. There must be an At Symbol (@) between the local and domain portions of email
  3. Domain must contain only letters, digits, underscores, periods and hyphens
    • Cannot begin with a hyphen
    • Cannot end with a hyphen
    • Cannot contain two successive hyphens
  4. There must be a period between the domain and TLD portions of the email
    • TLD must only contain letters
    • TLD must not end with a period

So far I have been trying to use the following regex:

^((?!.\.{2,}.)[^.][-a-zA-Z0-9_.\!\@\#\$\%\^\&\*\(\)\,\'\+\=\`\{\|\}\~\-]+[^.])@((?!.\-{2,}.)[^-_][-a-zA-Z0-9_.]+[^-_]\.[a-zA-z]+)$


^((?!.\.{2,}.)[^.][-a-zA-Z0-9_.!@#$%^&*(),'+=`{|}~-]+[^.])@((?!.\-{2,}.)[^-_][-a-zA-Z0-9_.]+[^-_]\.[a-zA-z]+)$

This is still failing with invalid emails (e.g. [email protected]).

What am I missing or doing wrong with the regex? Is there another way I can ensure an email conforms to the requirements without a regex?

Thanks in advance!

P.S. This is in Java, so all escaped characters in above regex have to be double escaped (e.g. \. is \\.). I have also been using Regexper to help me visualize this since I am obviously no regex guru.

Upvotes: 0

Views: 538

Answers (2)

Bernhard Barker
Bernhard Barker

Reputation: 55609

I suggest:

Split on the @ symbol. Split on the last period (using String#substring and String#lastIndexOf). Now you have the local part, the domain and the TLD all in separate strings, use if-statements to validate. If there are any rules applicable to all (2 consecutive periods?), do that before splitting. Much simpler to get right, much simpler to understand, much simpler to maintain.

But, if you really want to stick to regex, here's a few things I've seen:

The [^.] before the @ should be (?<!\.), otherwise the last character before the @ can be just about anything.

. is just one character, so (?!.\-{2,}.) and (?!.\.{2,}.) does not do what you think it does. Just making it .* seems to fix it. And you don't need to check any characters after the things you're looking for.

It hasn't been explicitly stated, but I presume the domain and TLD can't contain 2 successive periods either. If this is allowed, the first part of the regex needs to be (?!.*\.{2,}.*@) to stop at the @.

If you use String#matches, the ^ and $ isn't required.

There's some unneeded ()'s.

Final regex:

(?!.*\.{2,})[^.][-a-zA-Z0-9_.!@#$%^&*(),'+=`{|}~-]+(?<!\.)@(?!.*\-{2,})[^-_][-a-zA-Z0-9_.]+[^-_]\.[a-zA-z]+

If you choose to stick to regex, I suggest extensive commenting:

String regex =
          "(?!.*\\.{2,})" // doesn't contain 2 consecutive .'s
       // local part
          + "[^.]" // doesn't start with a .
          + "[-a-zA-Z0-9_.!@#$%^&*(),'+=`{|}~-]+" // valid chars for local part
          + "(?<!\\.)" // last char of local part isn't a .
       // at symbol
          + "@"
       // domain
          ...

It might seem like overkill, but you'll wish you had if you try to maintain it a few months down the line, especially if you haven't touched any regex in those months.

Upvotes: 2

rolfl
rolfl

Reputation: 17707

The common wisdom is that e-mails are too complex for single regex. It is easier to check an e-mail address by seeing if an SMTP server can send it. You have already been told that.

So, assuming you need to pre-validate an address (and assuming it is only the email portion, and not all the goodies you can have like unicode names, etc.) then my recommendation would be:

  1. Break the problem down in to smaller parts
  2. Give each part a method
  3. Validate each part (perhaps in a loop).
  4. Use combination of regex and standard logic to make sure it is valid (according to your rules)

This is the only realistic way to leave a somewhat reasonable system that's maintainable and understandable by the poor sucker who sees the code next time.

e.g.

private void validateNamePart(String npart) {
  if (!npart.matches("")) {
    throw new .....;
  }
}

private void validateName(String name) {
  int parts = 0;
  for (String npart : name.split("\\.")) {
    validateNamePart(npart);
    parts++;
  }
  if (parts == 0) {
     throw ....;
  }
}

private void validateDomainPart(String dpart) {
  ....
}

private void validateDomain(String domain) {
  ....
}

public void validateEMail(String email) {
  String parts = email.split("@");
  if (parts.length == 2) {
    validateName(parts[0]);
    validateDomain(parts[1]);
  } else {
    throw ....
  }
}

Upvotes: 1

Related Questions