user805627
user805627

Reputation: 4417

Validate email address in Javascript, and compatible with non-ASCII characters

There are many regexes which can be used to validate email address, but most of them aren't compatible with non-ASCII characters. Once an email address contains non-ASCII characters like 'Rδοκιμή@παράδειγμα.δοκιμή' or '管理员@中国互联网络信息中心.中国', they can't recognize it correctly. How to construct a regex which is used to validate email address and compatible with non-ASCII characters?

Upvotes: 3

Views: 4024

Answers (3)

bhovhannes
bhovhannes

Reputation: 5679

According to this source, JavaScript, which does not offer any Unicode support through its RegExp class, does support \uFFFF for matching a single Unicode code point as part of its string syntax.
So, in order to define matches for Unicode characters, a set of \uXXXX symbols should be created. Plugin listed here enables creation of Unicode regular expressions and can be used to define Unicode regular expressions while using XRegExp JavaScript library.

Here is the function, which tests for valid ASCII email address:

/**
 * Checks if string contains valid email address as described
 * in RFC 2822: http://tools.ietf.org/html/rfc2822#section-3.4.1
 * This function omits the syntax using double quotes and square brackets
 * @return {Boolean}    True, if test succeeded.
 */
String.prototype.checkEmail = function()
{
    var reEmail = /^[a-z0-9!#$%&'*+\/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+\/=?^_`{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?$/;
    return reEmail.test(this);
}

// Usage example
alert( "[email protected]".checkEmail() ); // true
alert( "invalid_email.com".checkEmail() ); // false

In order to make it work for Unicode strings, one can include XRegExp library and use \\p{L} instead of a-z. Here is the complete code:

<!DOCTYPE html>
<html>
<head>
    <script src="xregexp-all-min.js"></script>
    <script>
        /**
         * Checks if string contains valid email address as described
         * in RFC 2822: http://tools.ietf.org/html/rfc2822#section-3.4.1
         * This function omits the syntax using double quotes and square brackets
         * @return {Boolean}    True, if test succeeded.
         */
        String.prototype.checkEmailX = function()
        {
            var reEmail = XRegExp("^[\\p{L}0-9!#$%&'*+\/=?^_`{|}~-]+(?:\.[\\p{L}0-9!#$%&'*+\/=?^_`{|}~-]+)*@(?:[\\p{L}0-9](?:[\\p{L}0-9-]*[\\p{L}0-9])?\.)+[\\p{L}0-9](?:[\\p{L}0-9-]*[\\p{L}0-9])?$");
            return reEmail.test(this);
        }

        alert( "true = " + "Rδοκιμή@παράδειγμα.δοκιμή".checkEmailX() ); // true
        alert( "true = " +"管理员@中国互联网络信息中心.中国".checkEmailX() ); // true
        alert( "true = " +"[email protected]".checkEmailX() ); // true
        alert( "false = " +"test_test.am".checkEmailX() ); // false
        alert( "true = " +"test@ράδ.am".checkEmailX() ); // true
    </script>
</head>
<body>
</body>
</html>

Upvotes: 6

Paweł Dyda
Paweł Dyda

Reputation: 18662

I had to write some article on how to validate email addresses using regular expression. Unfortunately, the outcome is it is not possible to validate email addresses this way.

Of course you would like to know why.

  1. Look at examples of valid email addresses in the Wikipedia article. It is nearly impossible to write regexp that would catch all of these criteria.
  2. For sure you already know about native Top Level Domains. That's the reason for your question. However, you might be unaware that apart from "standard" national TLD's we might have just about any name here. And it is a moving target, so...
  3. There is no specific policy that all the Domain Registries follow. I found out that for example Japanese Registry allows for having an ideographic full stops (both full width and half-width) as a label separator. I don't know how it could work, but that's what they allow. Also, it turned out that there is a difference between Japanese and Chinese Registries in maximum label's length. I can't see how it could be validated with regular expression.

So how can we validate an email than? One idea would be to simply ask if MTA for given domain exists (which just could not be done on the front-end side, that is using client-side JavaScript). Unfortunately, it poses the risk of DOS attack, so it is not necessary the greatest idea. And of course you won't know if the address is valid on the given server. To do that, you would need to connect to server and issue VRFY command, but thanks to the spammers, most servers will reply "550 No such user".

If the validation's purpose is solely to avoid user's mistake, you may want to add additional field and have user retype the email (which is not the best idea either).

Upvotes: 5

Joey
Joey

Reputation: 354576

Don't overcomplicate things, please.

Take a moment and think about why you need that. Most likely because you want to send your user an email, right? So I'd advocate for the easiest email validation regex there is:

/@/

Done. It will validate all valid email addresses. It will also incorrectly validate a lot of stuff that just looks like one but isn't actually valid, but most errors are either not filling out a form field or confusing fields and entering the wrong things in other fields.

Also you'll notice if an email address isn't valid because your mails bounce. And existence of an address is something no regex can ever do for you.

Upvotes: 3

Related Questions