Greg
Greg

Reputation: 7922

Is this a valid email address?

"Françoise Lefèvre"@example.com

I'm reading RFC 5321 to try to actually understand what constitutes a valid email address -- and I'm probably making this a lot more difficult than it needs to be -- but this has been bugging me.

               i.e., within a quoted string, any
               ASCII graphic or space is permitted
               without blackslash-quoting except
               double-quote and the backslash itself.

Does this mean that ASCII extended character sets are valid within quotes? Or does that imply standard ASCII table only?

EDIT - With the answers in mind, here's a simple jQuery validator that could work in supplement to the the plugin's built-in email validation to check the characters.

jQuery.validator.addMethod("ascii_email", function( value, element ) { 
    // In compliance with RFC 5321, this allows all standard printing ASCII characters in quoted text.
    // Unquoted text must be ASCII-US alphanumeric or one of the following: ! # $ % & ' * + - / = ? ^ _ ` { | } ~   
    // @ and . get a free pass, as this is meant to be used together with the email validator

    var result = this.optional(element) || 
        (
            /^[\u002a\u002b\u003d\u003f\u0040\u0020-\u0027\u002d-u002f\u0030-\u0039\u0041-\u005a\u005e-\u007e]+$/.test(value.replace(/(["])(?:\\\1|.)*?\1/, "")) &&     
            /^[\u0020-\u007e]+$/.test(value.match(/(["])(?:\\\1|.)*?\1/, ""))   
        );
    return result;
}, "Invalid characters");

The plugin's built-in validation appears to be pretty good, except for catching invalid characters. Out of the test cases listed here it only disallows comments, folding whitespace and addresses lacking a TDL (ie: @localhost, @255.255.255.255) -- all of which I can easily live without.

Upvotes: 11

Views: 1910

Answers (4)

robertc
robertc

Reputation: 75747

The HTML5 spec has an interesting take on the issue of valid email addresses:

A valid e-mail address is a string that matches the ABNF production 1*( atext / "." ) "@" ldh-str 1*( "." ldh-str ) where atext is defined in RFC 5322 section 3.2.3, and ldh-str is defined in RFC 1034 section 3.5.

The nice thing about this, of course, is that you can then take a look at the open source browser's source code for validating it (look for the IsValidEmailAddress function). Of course it's in C, but not too hard to translate to JS.

Upvotes: 0

Heinzi
Heinzi

Reputation: 172390

In this RFC, ASCII means US-ASCII , i.e., no characters with a value greater than 127 are allowed. As a proof, here are some quotes from RFC 5321:

The mail data may contain any of the 128 ASCII character codes, [...]

[...]

Systems MUST NOT define mailboxes in such a way as to require the use in SMTP of non-ASCII characters (octets with the high order bit set to one) or ASCII "control characters" (decimal value 0-31 and 127). These characters MUST NOT be used in MAIL or RCPT commands or other commands that require mailbox names.

These quotes quite clearly imply that characters with a value greater than 127 are considered non-ASCII. Since such characters are explicitly forbidden in MAIL TO or RCPT commands, it is impossible to use them for e-mail addresses.

Thus, "Francoise Lefevre"@example.com is a perfectly valid address (according to the RFC), whereas "Françoise Lefèvre"@example.com is not.

Upvotes: 4

fredley
fredley

Reputation: 33921

Technically yes, but read on:

While the above definition for Local-part is relatively permissive,
for maximum interoperability, a host that expects to receive mail SHOULD avoid defining mailboxes where the Local-part requires (or uses) the Quoted-string form or where the Local-part is case- sensitive.

...

Systems MUST NOT define mailboxes in such a way as to require the use in SMTP of non-ASCII characters.

Upvotes: 1

James Black
James Black

Reputation: 41858

According to this MSDN page the extended ASCII characters aren't valid, currently, but there is a proposed specification that would change this.

http://msdn.microsoft.com/en-us/library/system.net.mail.mailaddress(VS.90).aspx

The important part is here:

Thomas Lee is correct in that a quoted local part is valid in an email address and certain mail addresses may be invalid if not in a quoted string. However, the characters that others of you have mentioned such as the umlaut and the agave are not in the ASCII character set, they are extended ASCII. In RFC 2822 (and subsequent RFC's 5322 and 3696) the dtext specification (allowed in quoted local parts) only allows most ASCII values (RFC 2822, section 3.4.1) which includes values in ranges from 33-90 and 94-126. RFC 5335 has been proposed that would allow non-ascii characters in the addr-spec, however it is still labeled as experimental and as such is not supported in MailAddress.

Upvotes: 4

Related Questions