acrosman
acrosman

Reputation: 12900

How can I validate an email address using a regular expression?

Over the years I have slowly developed a regular expression that validates most email addresses correctly, assuming they don't use an IP address as the server part.

I use it in several PHP programs, and it works most of the time. However, from time to time I get contacted by someone that is having trouble with a site that uses it, and I end up having to make some adjustment (most recently I realized that I wasn't allowing four-character TLDs).

What is the best regular expression you have or have seen for validating emails?

I've seen several solutions that use functions that use several shorter expressions, but I'd rather have one long complex expression in a simple function instead of several short expression in a more complex function.

Upvotes: 4265

Views: 2735175

Answers (30)

bravo
bravo

Reputation: 11

We get a more practical implementation of RFC 5322 if we omit IP addresses, domain-specific addresses, the syntax using double quotes and square brackets. It will still match 99.99% of all email addresses in actual use today. Refer: https://www.regular-expressions.info/email.html

    private String REGEX_EMAIL_CHECK = "^[A-Za-z0-9!#%&'*+/=?^_`{|}~-]+(?:\\.[A-Za-z0-9!#%&'*+/=?^_`{|}~-]+)*@(?:[A-Za-z0-9](?:[A-Za-z0-9-]*[A-Za-z0-9])?\\.)+[A-Za-z0-9](?:[A-Za-z0-9-]*[A-Za-z0-9])?$";


    private boolean isValidEmailWithRegex(String email) {

        return email.matches(REGEX_EMAIL_CHECK);

    }


    public void testEmailAddressRegex() {
        // valid ?
        String[] validEmailAddresses = new String[] {
                "[email protected]",
                "[email protected]",
                "[email protected]",
                "[email protected]",
                "[email protected]",
                "[email protected]",
                "[email protected]",
                "[email protected]",
                "[email protected]",
                "[email protected]",
                "a%[email protected]",
                "[email protected]",
                "a~^[email protected]",
                "a~&[email protected]",
                "a*[email protected]",
                "[email protected]",
                "a~!#%^&*[email protected]",
                "a~!%^&*[email protected]",
                "a~%^&*[email protected]",
                "a~^&*[email protected]",
                "[email protected]",
                "[email protected]",
                "[email protected]",
                "[email protected]",
                "[email protected]",
                "abcdefghijklmnopqrstuvwxyz-abcdefghijklmnopqrstuvwxyz@example.com",

        };

        for (String email : validEmailAddresses) {
            System.out.println("Testing email address: " + email + " is valid");
            assertTrue(isValidEmailWithRegex(email));
        }

        // Invalid ?
        String[] invalidEmailAddresses = new String[] {
                "",
                " ",
                "@",
                "username",
                "username@example",
                "username@example.",
                "[email protected].",
                "username@example-com.",
                "[email protected].",
                "username@@example.com",
                "username@example@com",
                "[email protected]@",
                "[email protected]@example",
                "[email protected]@example.com",
                "[email protected]@example.com.",
                "[email protected]@example-com.",
                "[email protected]@example.com-.",
                "[email protected]@example.com.example",
                "[email protected]@example-com.example",
                "[email protected]@example.com-example",
        };

        for (String email : invalidEmailAddresses) {
            System.out.println("Testing email address: " + email + " is Invalid");
            assertFalse(isValidEmailWithRegex(email));
        }
    }

Upvotes: 1

Mark Stewart
Mark Stewart

Reputation: 2098

For database, version 23+, there is a built-in Usage Domain name email_d which uses the regex

^([a-zA-Z0-9!#$%&*+=?^_`{|}~-]+(\.[ A-Za-z0-9!#$%&*+=?^_`{|}~-]+)*)@(([ a-zA-Z0-9]([a-zA-Z0-9-]*[a-zA-Z0-9] )?\.)+[a-zA-Z0-9]([a-zA-Z0-9-]*[a-z A-Z0-9])?)$

Of course, this is rather simplistic compared to the other answers, but... it is built-in.

Upvotes: 1

Thejaka
Thejaka

Reputation: 27

This answer largely and directly addresses multiple issues in the currently highest upvoted answer.

This answer also reinterprets and optimizes the regex as used in WebKit for example, for the Email Input Type.

The explicit ordering of certain parts of the expressions and characters/ranges in character classes is in some cases intentional. For example, some parts of the patterns have been intentionally optimized for lower-case alpha, digits, and upper-case alpha in that order, assuming that to cover the most frequent usages, and even if not frequent, then as canonicalized.

First, an attempt at a potentially correct implementation, barring any errata (currently, still under active development and improvement):

Email (RFCs 5322 and 5321 interpreted for Internet addresses)

  1. This validation RegExp for JavaScript as specified, once the foldable white space is stripped, is for addr-spec, used as Mailbox. I believe this is the most common validation use-case.
  2. Emphasis on INTERNET as opposed to Intranet or Local. If you need local/intranet host name version, please substitute the domain portion with your own.
  3. quoted-string is intentionally not allowed to be empty. This may be a slight deviation from the RFC as strictly defined. If anyone thinks this should not be so, please comment.
  4. I have interpreted the specs or inferred as follows: The maximum length of the domain portion of an email address is intended to be 254, excluding an implicit (usually omitted) final period (dot: ".") for the domain root. I interpret that this is intended to leave room in a 256 string buffer for the longest domain part, an implicit final period/dot, and a null terminator, as follows: 254 (full domain name without the final dot) + 1 (final dot) + 1 (\0 or \x00 etc. null terminator) = 256. The local part should have a max length of 64.
  5. CFWS omitted, as per spec, but strip the white space from the pattern (except the single space in the quoted-pair character class) before use, as your environment (such as JavaScript) requires. I will add a one-liner once I have finalized the expression.

Domain part for Intranet/Local:

(?=.{1,254}$)[a-z0-9A-Z](?:[a-z0-9A-Z-]{0,61}[a-z0-9A-Z]|)(?:\.[a-zA-Z]([a-z0-9A-Z-]{0,61}[a-z0-9A-Z]|))*

Full and Expanded RegExp:

^(?:
    [-^-~/-9A-Z!#-'*+=?]+(?:\.[-^-~/-9A-Z!#-'*+=?]+)*
    |
    "
        (?:
            [!#-[\]-~]
            |
            \\[ -~\t]
        )+
    "
)@(?:
    (?=.{4,254}$)(?:[a-z0-9A-Z](?:[a-z0-9A-Z-]{0,61}[a-z0-9A-Z]|)\.)+[a-zA-Z][a-z0-9A-Z-]{0,61}[a-z0-9A-Z]
    |
    \[
        (?:25[0-5]|(?:1[0-9]|2[0-4]|[1-9]|)[0-9])(?:\.(?:25[0-5]|(?:1[0-9]|2[0-4]|[1-9]|)[0-9])){3}
        |
        [a-zA-Z0-9-]*[a-zA-Z0-9]:[!-Z^-~]
    \]
)$

For comparison, the original, as seems to have been reposted by @DouglasDaseeco:

^(?:
    [a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*
    |
    "
        (?:
            [\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]
            |
            \\[\x01-\x09\x0b\x0c\x0e-\x7f]
        )*
    "
)@(?:
    (?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?
    |
    \[
        (?:(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9]))\.){3}(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9])
        |
        [a-z0-9-]*[a-z0-9]:
            (?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)
    \]
)$

From the WhatWG

Specification

/^[a-zA-Z0-9.!#$%&'*+\/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$/

Optimized: Strict #1

/^[--9^-~A-Z!#-'*+=?]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$/

Alternative

/^[--9^-~A-Z!#-'*+=?]{1,64}@(?=.{1,254}$)[a-z0-9A-Z](?:[a-z0-9A-Z-]{0,61}[a-z0-9A-Z]|)(?:\.[a-z0-9A-Z](?:[a-z0-9A-Z-]{0,61}[a-z0-9A-Z]|))*$

From the WebKit project; This is practically the same as WhtWG

Original:

^[a-zA-Z0-9.!#$%&'*+\/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$

Optimized:

^[--9^-~A-Z!#-'*+=?]+@[a-z0-9A-Z](?:[a-z0-9A-Z-]{0,61}[a-z0-9A-Z]|)(?:\.[a-z0-9A-Z](?:[a-z0-9A-Z-]{0,61}[a-z0-9A-Z]|))*$

Alternative:

^[--9^-~A-Z!#-'*+=?]{1,64}@(?=.{1,254}$)[a-z0-9A-Z](?:[a-z0-9A-Z-]{0,61}[a-z0-9A-Z]|)(?:\.[a-z0-9A-Z](?:[a-z0-9A-Z-]{0,61}[a-z0-9A-Z]|))*$

Some definitions extracted from RFC 5322

address         = mailbox / group
mailbox         = name-addr / addr-spec
name-addr       = [display-name] angle-addr
angle-addr      = [CFWS] "<" addr-spec ">" [CFWS] / obs-angle-addr
group           = display-name ":" [group-list] ";" [CFWS]
display-name    = phrase
mailbox-list    = (mailbox *("," mailbox)) / obs-mbox-list
address-list    = (address *("," address)) / obs-addr-list
group-list      = mailbox-list / CFWS / obs-group-list

addr-spec       = local-part "@" domain
local-part      = dot-atom / quoted-string / obs-local-part
domain          = dot-atom / domain-literal / obs-domain
domain-literal  = [CFWS] "[" *([FWS] dtext) "]" [CFWS]
dtext           = %d33-90 / %d94-126 / obs-dtext
                        ; Printable US-ASCII characters not including "[", "]", or "\"

quoted-string   = [CFWS] DQUOTE *([FWS] qcontent) [FWS] DQUOTE [CFWS]
qcontent        = qtext / quoted-pair
qtext           = %d33 / %d35-91 / %d93-126 / obs-qtext
                        ; Printable US-ASCII characters not including "\" or the quote character

dot-atom        = [CFWS] dot-atom-text [CFWS]
dot-atom-text   = 1*atext *("." atext)
atext           = ALPHA / DIGIT / "!" / "#" / "$" / "%" / "&" / "'" / "*" / "+" / "-" / "/" /
                    "=" / "?" / "^" / "_" / "`" / "{" / "|" / "}" / "~"

Some definitions expanded or reinterpreted

dtext           = !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ^_`abcdefghijklmnopqrstuvwxyz{|}~
                = !-Z^-~
qtext           = !#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[]^_`abcdefghijklmnopqrstuvwxyz{|}~
                = !#-[\]-~
atext           = abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789!#$%&'*+-/=?^_`{|}~
                = -^-~/-9A-Z!#-'*+=?

DIGIT           = %x30-39               ; 0-9
                = 0-9
                = \d
ALPHA           = %x41-5A / %x61-7A     ; A-Z / a-z
                = A-Za-z
VCHAR           = %x21-7E               ; Visible (printing) characters
                = !-~
WSP             = SP / HTAB             ; White space
                = [ \t]

Seeming issues in the accepted, upvoted answer

  • The original-original answer does not seem to be regex.
  • Part of the answer seems to deal with parsing the whole message including headers/content/body. The whole spec is irrelevant. You have to go through all the RFCs and specs for the obscure points, but keep focus on the addr-spec.

Parentheses seem to be mismatched or mispositioned

Last part of the IPv4 pattern seems to have been grouped together with the address-literal pattern.

Control characters should be prohibited

Including but not limited to \x7f which is equal to ASCII 127 or DEL.

RFC 5321 section 4.1.2:

Systems MUST NOT define mailboxes in such a way as to require the use in SMTP of non-ASCII characters (octets with the high order bit set to one) or ASCII "control characters" (decimal value 0-31 and 127). These characters MUST NOT be used in MAIL or RCPT commands or other commands that require mailbox names.

Control characters are not allowed in address-literal

RFC 5321 sections 4.1.2, 4.1.3:

address-literal  = "[" ( IPv4-address-literal /
                 IPv6-address-literal /
                 General-address-literal ) "]"

IPv4-address-literal  = Snum 3("."  Snum)

IPv6-address-literal  = "IPv6:" IPv6-addr

General-address-literal  = Standardized-tag ":" 1*dcontent

Standardized-tag  = Ldh-str
                  ; Standardized-tag MUST be specified in a
                  ; Standards-Track RFC and registered with IANA

Ldh-str        = *( ALPHA / DIGIT / "-" ) Let-dig

If these solutions are cross-checked and peer-verified, anyone may incorporate this info into the original community-wiki answer, with appropriate credit.

Upvotes: 3

Aman Godara
Aman Godara

Reputation: 511

if you are looking to check against a simple format: [email protected] or [email protected]

/^[^.\s@]+(\.[^.\s@]+)*@[^.\s@]+(\.[^.\s@]+)+$/

exact format used: ^x(.x)*@x(.x)+$

x any non-empty string without .(dot), \s(white space character) and @(at) in it

* repeat zero or more times

+ repeat one or more times

^ start of string

$ end of string

Upvotes: 0

AZ_
AZ_

Reputation: 21899

According to the official standard, RFC 2822, a valid email regex is:

(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])

If you want to use it in Java, it's really very easy:

import java.util.regex.*;

class regexSample
{
    public static void main(String args[])
    {
        //Input the string for validation
        String email = "[email protected]";

        //Set the email pattern string
        Pattern p = Pattern.compile(" (?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"
                +"(?:[\\x01-\\x08\\x0b\\x0c\\x0e-\\x1f\\x21\\x23-\\x5b\\x5d-\\x7f]|\\[\\x01-\\x09\\x0b\\x0c\\x0e-\\x7f])*\")"
                        + "@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\\x01-\\x08\\x0b\\x0c\\x0e-\\x1f\\x21-\\x5a\\x53-\\x7f]|\\[\\x01-\\x09\\x0b\\x0c\\x0e-\\x7f])+)\\]");

        //Match the given string with the pattern
        Matcher m = p.matcher(email);

        //Check whether match is found
        boolean matchFound = m.matches();

        if (matchFound)
            System.out.println("Valid Email Id.");
        else
            System.out.println("Invalid Email Id.");
    }
}

Upvotes: 16

Cees Timmerman
Cees Timmerman

Reputation: 19644

I'm still using:

^[A-Za-z0-9._+\-\']+@[A-Za-z0-9.\-]+\.[A-Za-z]{2,}$

But with IPv6 and Unicode coming up, perhaps this is best:

console.log(/^[\p{L}!#-'*+\-/\d=?^-~]+(.[\p{L}!#-'*+\-/\d=?^-~])*@[^@\s]{2,}$/u.test("תה.בועות@😀.fm"))

Gmail allows sequential dots, but Microsoft Exchange Server 2007 refuses them, which follows the most recent standard afaik.

Upvotes: 7

bortzmeyer
bortzmeyer

Reputation: 35459

The fully RFC 822 compliant regex is inefficient and obscure because of its length. Fortunately, RFC 822 was superseded twice and the current specification for email addresses is RFC 5322. RFC 5322 leads to a regex that can be understood if studied for a few minutes and is efficient enough for actual use.

One RFC 5322 compliant regex can be found at the top of the page at http://emailregex.com/ but uses the IP address pattern that is floating around the internet with a bug that allows 00 for any of the unsigned byte decimal values in a dot-delimited address, which is illegal. The rest of it appears to be consistent with the RFC 5322 grammar and passes several tests using grep -Po, including cases domain names, IP addresses, bad ones, and account names with and without quotes.

Correcting the 00 bug in the IP pattern, we obtain a working and fairly fast regex. (Scrape the rendered version, not the markdown, for actual code.)

(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9]))\.){3}(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9])|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])

or:

(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9]))\.){3}(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9])|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])

Here is diagram of finite state machine for above regexp which is more clear than regexp itself enter image description here

The more sophisticated patterns in Perl and PCRE (regex library used e.g. in PHP) can correctly parse RFC 5322 without a hitch. Python and C# can do that too, but they use a different syntax from those first two. However, if you are forced to use one of the many less powerful pattern-matching languages, then it’s best to use a real parser.

It's also important to understand that validating it per the RFC tells you absolutely nothing about whether that address actually exists at the supplied domain, or whether the person entering the address is its true owner. People sign others up to mailing lists this way all the time. Fixing that requires a fancier kind of validation that involves sending that address a message that includes a confirmation token meant to be entered on the same web page as was the address.

Confirmation tokens are the only way to know you got the address of the person entering it. This is why most mailing lists now use that mechanism to confirm sign-ups. After all, anybody can put down [email protected], and that will even parse as legal, but it isn't likely to be the person at the other end.

For PHP, you should not use the pattern given in Validate an E-Mail Address with PHP, the Right Way from which I quote:

There is some danger that common usage and widespread sloppy coding will establish a de facto standard for e-mail addresses that is more restrictive than the recorded formal standard.

That is no better than all the other non-RFC patterns. It isn’t even smart enough to handle even RFC 822, let alone RFC 5322. This one, however, is.

If you want to get fancy and pedantic, implement a complete state engine. A regular expression can only act as a rudimentary filter. The problem with regular expressions is that telling someone that their perfectly valid e-mail address is invalid (a false positive) because your regular expression can't handle it is just rude and impolite from the user's perspective. A state engine for the purpose can both validate and even correct e-mail addresses that would otherwise be considered invalid as it disassembles the e-mail address according to each RFC. This allows for a potentially more pleasing experience, like

The specified e-mail address 'myemail@address,com' is invalid. Did you mean '[email protected]'?

See also Validating Email Addresses, including the comments. Or Comparing E-mail Address Validating Regular Expressions.

Regular expression visualization

Debuggex Demo

Upvotes: 3477

Learner
Learner

Reputation: 5292

I did not find any that deals with a top-level domain name, but it should be considered.

So for me the following worked:

[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+(?:[A-Z]{2}AAA|AARP|ABB|ABBOTT|ABOGADO|AC|ACADEMY|ACCENTURE|ACCOUNTANT|ACCOUNTANTS|ACO|ACTIVE|ACTOR|AD|ADAC|ADS|ADULT|AE|AEG|AERO|AF|AFL|AG|AGENCY|AI|AIG|AIRFORCE|AIRTEL|AL|ALIBABA|ALIPAY|ALLFINANZ|ALSACE|AM|AMICA|AMSTERDAM|ANALYTICS|ANDROID|AO|APARTMENTS|APP|APPLE|AQ|AQUARELLE|AR|ARAMCO|ARCHI|ARMY|ARPA|ARTE|AS|ASIA|ASSOCIATES|AT|ATTORNEY|AU|AUCTION|AUDI|AUDIO|AUTHOR|AUTO|AUTOS|AW|AX|AXA|AZ|AZURE|BA|BAIDU|BAND|BANK|BAR|BARCELONA|BARCLAYCARD|BARCLAYS|BARGAINS|BAUHAUS|BAYERN|BB|BBC|BBVA|BCN|BD|BE|BEATS|BEER|BENTLEY|BERLIN|BEST|BET|BF|BG|BH|BHARTI|BI|BIBLE|BID|BIKE|BING|BINGO|BIO|BIZ|BJ|BLACK|BLACKFRIDAY|BLOOMBERG|BLUE|BM|BMS|BMW|BN|BNL|BNPPARIBAS|BO|BOATS|BOEHRINGER|BOM|BOND|BOO|BOOK|BOOTS|BOSCH|BOSTIK|BOT|BOUTIQUE|BR|BRADESCO|BRIDGESTONE|BROADWAY|BROKER|BROTHER|BRUSSELS|BS|BT|BUDAPEST|BUGATTI|BUILD|BUILDERS|BUSINESS|BUY|BUZZ|BV|BW|BY|BZ|BZH|CA|CAB|CAFE|CAL|CALL|CAMERA|CAMP|CANCERRESEARCH|CANON|CAPETOWN|CAPITAL|CAR|CARAVAN|CARDS|CARE|CAREER|CAREERS|CARS|CARTIER|CASA|CASH|CASINO|CAT|CATERING|CBA|CBN|CC|CD|CEB|CENTER|CEO|CERN|CF|CFA|CFD|CG|CH|CHANEL|CHANNEL|CHAT|CHEAP|CHLOE|CHRISTMAS|CHROME|CHURCH|CI|CIPRIANI|CIRCLE|CISCO|CITIC|CITY|CITYEATS|CK|CL|CLAIMS|CLEANING|CLICK|CLINIC|CLINIQUE|CLOTHING|CLOUD|CLUB|CLUBMED|CM|CN|CO|COACH|CODES|COFFEE|COLLEGE|COLOGNE|COM|COMMBANK|COMMUNITY|COMPANY|COMPARE|COMPUTER|COMSEC|CONDOS|CONSTRUCTION|CONSULTING|CONTACT|CONTRACTORS|COOKING|COOL|COOP|CORSICA|COUNTRY|COUPONS|COURSES|CR|CREDIT|CREDITCARD|CREDITUNION|CRICKET|CROWN|CRS|CRUISES|CSC|CU|CUISINELLA|CV|CW|CX|CY|CYMRU|CYOU|CZ|DABUR|DAD|DANCE|DATE|DATING|DATSUN|DAY|DCLK|DE|DEALER|DEALS|DEGREE|DELIVERY|DELL|DELTA|DEMOCRAT|DENTAL|DENTIST|DESI|DESIGN|DEV|DIAMONDS|DIET|DIGITAL|DIRECT|DIRECTORY|DISCOUNT|DJ|DK|DM|DNP|DO|DOCS|DOG|DOHA|DOMAINS|DOOSAN|DOWNLOAD|DRIVE|DUBAI|DURBAN|DVAG|DZ|EARTH|EAT|EC|EDEKA|EDU|EDUCATION|EE|EG|EMAIL|EMERCK|ENERGY|ENGINEER|ENGINEERING|ENTERPRISES|EPSON|EQUIPMENT|ER|ERNI|ES|ESQ|ESTATE|ET|EU|EUROVISION|EUS|EVENTS|EVERBANK|EXCHANGE|EXPERT|EXPOSED|EXPRESS|FAGE|FAIL|FAIRWINDS|FAITH|FAMILY|FAN|FANS|FARM|FASHION|FAST|FEEDBACK|FERRERO|FI|FILM|FINAL|FINANCE|FINANCIAL|FIRESTONE|FIRMDALE|FISH|FISHING|FIT|FITNESS|FJ|FK|FLIGHTS|FLORIST|FLOWERS|FLSMIDTH|FLY|FM|FO|FOO|FOOTBALL|FORD|FOREX|FORSALE|FORUM|FOUNDATION|FOX|FR|FRESENIUS|FRL|FROGANS|FUND|FURNITURE|FUTBOL|FYI|GA|GAL|GALLERY|GAME|GARDEN|GB|GBIZ|GD|GDN|GE|GEA|GENT|GENTING|GF|GG|GGEE|GH|GI|GIFT|GIFTS|GIVES|GIVING|GL|GLASS|GLE|GLOBAL|GLOBO|GM|GMAIL|GMO|GMX|GN|GOLD|GOLDPOINT|GOLF|GOO|GOOG|GOOGLE|GOP|GOT|GOV|GP|GQ|GR|GRAINGER|GRAPHICS|GRATIS|GREEN|GRIPE|GROUP|GS|GT|GU|GUCCI|GUGE|GUIDE|GUITARS|GURU|GW|GY|HAMBURG|HANGOUT|HAUS|HEALTH|HEALTHCARE|HELP|HELSINKI|HERE|HERMES|HIPHOP|HITACHI|HIV|HK|HM|HN|HOCKEY|HOLDINGS|HOLIDAY|HOMEDEPOT|HOMES|HONDA|HORSE|HOST|HOSTING|HOTELES|HOTMAIL|HOUSE|HOW|HR|HSBC|HT|HU|HYUNDAI|IBM|ICBC|ICE|ICU|ID|IE|IFM|IINET|IL|IM|IMMO|IMMOBILIEN|IN|INDUSTRIES|INFINITI|INFO|ING|INK|INSTITUTE|INSURANCE|INSURE|INT|INTERNATIONAL|INVESTMENTS|IO|IPIRANGA|IQ|IR|IRISH|IS|ISELECT|IST|ISTANBUL|IT|ITAU|IWC|JAGUAR|JAVA|JCB|JE|JETZT|JEWELRY|JLC|JLL|JM|JMP|JO|JOBS|JOBURG|JOT|JOY|JP|JPRS|JUEGOS|KAUFEN|KDDI|KE|KFH|KG|KH|KI|KIA|KIM|KINDER|KITCHEN|KIWI|KM|KN|KOELN|KOMATSU|KP|KPN|KR|KRD|KRED|KW|KY|KYOTO|KZ|LA|LACAIXA|LAMBORGHINI|LAMER|LANCASTER|LAND|LANDROVER|LANXESS|LASALLE|LAT|LATROBE|LAW|LAWYER|LB|LC|LDS|LEASE|LECLERC|LEGAL|LEXUS|LGBT|LI|LIAISON|LIDL|LIFE|LIFEINSURANCE|LIFESTYLE|LIGHTING|LIKE|LIMITED|LIMO|LINCOLN|LINDE|LINK|LIVE|LIVING|LIXIL|LK|LOAN|LOANS|LOL|LONDON|LOTTE|LOTTO|LOVE|LR|LS|LT|LTD|LTDA|LU|LUPIN|LUXE|LUXURY|LV|LY|MA|MADRID|MAIF|MAISON|MAKEUP|MAN|MANAGEMENT|MANGO|MARKET|MARKETING|MARKETS|MARRIOTT|MBA|MC|MD|ME|MED|MEDIA|MEET|MELBOURNE|MEME|MEMORIAL|MEN|MENU|MEO|MG|MH|MIAMI|MICROSOFT|MIL|MINI|MK|ML|MM|MMA|MN|MO|MOBI|MOBILY|MODA|MOE|MOI|MOM|MONASH|MONEY|MONTBLANC|MORMON|MORTGAGE|MOSCOW|MOTORCYCLES|MOV|MOVIE|MOVISTAR|MP|MQ|MR|MS|MT|MTN|MTPC|MTR|MU|MUSEUM|MUTUELLE|MV|MW|MX|MY|MZ|NA|NADEX|NAGOYA|NAME|NAVY|NC|NE|NEC|NET|NETBANK|NETWORK|NEUSTAR|NEW|NEWS|NEXUS|NF|NG|NGO|NHK|NI|NICO|NINJA|NISSAN|NL|NO|NOKIA|NORTON|NOWRUZ|NP|NR|NRA|NRW|NTT|NU|NYC|NZ|OBI|OFFICE|OKINAWA|OM|OMEGA|ONE|ONG|ONL|ONLINE|OOO|ORACLE|ORANGE|ORG|ORGANIC|ORIGINS|OSAKA|OTSUKA|OVH|PA|PAGE|PAMPEREDCHEF|PANERAI|PARIS|PARS|PARTNERS|PARTS|PARTY|PE|PET|PF|PG|PH|PHARMACY|PHILIPS|PHOTO|PHOTOGRAPHY|PHOTOS|PHYSIO|PIAGET|PICS|PICTET|PICTURES|PID|PIN|PING|PINK|PIZZA|PK|PL|PLACE|PLAY|PLAYSTATION|PLUMBING|PLUS|PM|PN|POHL|POKER|PORN|POST|PR|PRAXI|PRESS|PRO|PROD|PRODUCTIONS|PROF|PROMO|PROPERTIES|PROPERTY|PROTECTION|PS|PT|PUB|PW|PY|QA|QPON|QUEBEC|RACING|RE|READ|REALTOR|REALTY|RECIPES|RED|REDSTONE|REDUMBRELLA|REHAB|REISE|REISEN|REIT|REN|RENT|RENTALS|REPAIR|REPORT|REPUBLICAN|REST|RESTAURANT|REVIEW|REVIEWS|REXROTH|RICH|RICOH|RIO|RIP|RO|ROCHER|ROCKS|RODEO|ROOM|RS|RSVP|RU|RUHR|RUN|RW|RWE|RYUKYU|SA|SAARLAND|SAFE|SAFETY|SAKURA|SALE|SALON|SAMSUNG|SANDVIK|SANDVIKCOROMANT|SANOFI|SAP|SAPO|SARL|SAS|SAXO|SB|SBS|SC|SCA|SCB|SCHAEFFLER|SCHMIDT|SCHOLARSHIPS|SCHOOL|SCHULE|SCHWARZ|SCIENCE|SCOR|SCOT|SD|SE|SEAT|SECURITY|SEEK|SELECT|SENER|SERVICES|SEVEN|SEW|SEX|SEXY|SFR|SG|SH|SHARP|SHELL|SHIA|SHIKSHA|SHOES|SHOW|SHRIRAM|SI|SINGLES|SITE|SJ|SK|SKI|SKIN|SKY|SKYPE|SL|SM|SMILE|SN|SNCF|SO|SOCCER|SOCIAL|SOFTBANK|SOFTWARE|SOHU|SOLAR|SOLUTIONS|SONY|SOY|SPACE|SPIEGEL|SPREADBETTING|SR|SRL|ST|STADA|STAR|STARHUB|STATEFARM|STATOIL|STC|STCGROUP|STOCKHOLM|STORAGE|STUDIO|STUDY|STYLE|SU|SUCKS|SUPPLIES|SUPPLY|SUPPORT|SURF|SURGERY|SUZUKI|SV|SWATCH|SWISS|SX|SY|SYDNEY|SYMANTEC|SYSTEMS|SZ|TAB|TAIPEI|TAOBAO|TATAMOTORS|TATAR|TATTOO|TAX|TAXI|TC|TCI|TD|TEAM|TECH|TECHNOLOGY|TEL|TELEFONICA|TEMASEK|TENNIS|TF|TG|TH|THD|THEATER|THEATRE|TICKETS|TIENDA|TIFFANY|TIPS|TIRES|TIROL|TJ|TK|TL|TM|TMALL|TN|TO|TODAY|TOKYO|TOOLS|TOP|TORAY|TOSHIBA|TOURS|TOWN|TOYOTA|TOYS|TR|TRADE|TRADING|TRAINING|TRAVEL|TRAVELERS|TRAVELERSINSURANCE|TRUST|TRV|TT|TUBE|TUI|TUSHU|TV|TW|TZ|UA|UBS|UG|UK|UNIVERSITY|UNO|UOL|US|UY|UZ|VA|VACATIONS|VANA|VC|VE|VEGAS|VENTURES|VERISIGN|VERSICHERUNG|VET|VG|VI|VIAJES|VIDEO|VILLAS|VIN|VIP|VIRGIN|VISION|VISTA|VISTAPRINT|VIVA|VLAANDEREN|VN|VODKA|VOLKSWAGEN|VOTE|VOTING|VOTO|VOYAGE|VU|WALES|WALTER|WANG|WANGGOU|WATCH|WATCHES|WEATHER|WEBCAM|WEBER|WEBSITE|WED|WEDDING|WEIR|WF|WHOSWHO|WIEN|WIKI|WILLIAMHILL|WIN|WINDOWS|WINE|WME|WORK|WORKS|WORLD|WS|WTC|WTF|XBOX|XEROX|XIN|XN--11B4C3D|XN--1QQW23A|XN--30RR7Y|XN--3BST00M|XN--3DS443G|XN--3E0B707E|XN--3PXU8K|XN--42C2D9A|XN--45BRJ9C|XN--45Q11C|XN--4GBRIM|XN--55QW42G|XN--55QX5D|XN--6FRZ82G|XN--6QQ986B3XL|XN--80ADXHKS|XN--80AO21A|XN--80ASEHDB|XN--80ASWG|XN--90A3AC|XN--90AIS|XN--9DBQ2A|XN--9ET52U|XN--B4W605FERD|XN--C1AVG|XN--C2BR7G|XN--CG4BKI|XN--CLCHC0EA0B2G2A9GCD|XN--CZR694B|XN--CZRS0T|XN--CZRU2D|XN--D1ACJ3B|XN--D1ALF|XN--ECKVDTC9D|XN--EFVY88H|XN--ESTV75G|XN--FHBEI|XN--FIQ228C5HS|XN--FIQ64B|XN--FIQS8S|XN--FIQZ9S|XN--FJQ720A|XN--FLW351E|XN--FPCRJ9C3D|XN--FZC2C9E2C|XN--G2XX48C|XN--GECRJ9C|XN--H2BRJ9C|XN--HXT814E|XN--I1B6B1A6A2E|XN--IMR513N|XN--IO0A7I|XN--J1AEF|XN--J1AMH|XN--J6W193G|XN--JLQ61U9W7B|XN--KCRX77D1X4A|XN--KPRW13D|XN--KPRY57D|XN--KPU716F|XN--KPUT3I|XN--L1ACC|XN--LGBBAT1AD8J|XN--MGB9AWBF|XN--MGBA3A3EJT|XN--MGBA3A4F16A|XN--MGBAAM7A8H|XN--MGBAB2BD|XN--MGBAYH7GPA|XN--MGBB9FBPOB|XN--MGBBH1A71E|XN--MGBC0A9AZCG|XN--MGBERP4A5D4AR|XN--MGBPL2FH|XN--MGBT3DHD|XN--MGBTX2B|XN--MGBX4CD0AB|XN--MK1BU44C|XN--MXTQ1M|XN--NGBC5AZD|XN--NGBE9E0A|XN--NODE|XN--NQV7F|XN--NQV7FS00EMA|XN--NYQY26A|XN--O3CW4H|XN--OGBPF8FL|XN--P1ACF|XN--P1AI|XN--PBT977C|XN--PGBS0DH|XN--PSSY2U|XN--Q9JYB4C|XN--QCKA1PMC|XN--QXAM|XN--RHQV96G|XN--S9BRJ9C|XN--SES554G|XN--T60B56A|XN--TCKWE|XN--UNUP4Y|XN--VERMGENSBERATER-CTB|XN--VERMGENSBERATUNG-PWB|XN--VHQUV|XN--VUQ861B|XN--WGBH1C|XN--WGBL6A|XN--XHQ521B|XN--XKC2AL3HYE2A|XN--XKC2DL3A5EE0H|XN--Y9A3AQ|XN--YFRO4I67O|XN--YGBI2AMMX|XN--ZFR164B|XPERIA|XXX|XYZ|YACHTS|YAMAXUN|YANDEX|YE|YODOBASHI|YOGA|YOKOHAMA|YOUTUBE|YT|ZA|ZARA|ZERO|ZIP|ZM|ZONE|ZUERICH|ZW)\b

That easily discarded emails like [email protected], [email protected], etc.

The domain name can be further edited if needed, e.g., specific country domain, etc.

Another list of top level domains that updates frequently.

Upvotes: 2

Eggon
Eggon

Reputation: 2356

I'd like to propose my approach which is relatively simple while ensuring proper email structure and restricting forbidden characters. Valid for latin characters.

/^(?![\w\.@]*\.\.)(?![\w\.@]*\.@)(?![\w\.]*@\.)\w+[\w\.]*@[\w\.]+\.\w{2,}$/

Upvotes: 1

ThinkTrans
ThinkTrans

Reputation: 71

The question title is fairly generic, however the body of the question indicates that it is about the PHP based solution. Will try to address both.

Generically speaking, for all programming languages: Typically, validating" an e-mail address with a reg-ex is something that any internet based service provider should desist from. The possibilities of kinds of domain names and e-mail addresses have increased so much in terms of variety, any attempt at validation, which is not well thought may end up denying some valid users into your system. To avoid this, one of the best ways is to send an email to the user and verify it being received. The good folks at "Universal Acceptance Steering Group" have compiled a languagewise list of libraries which are found to be compliant/non-compliant with various parameters involving validations vis-a-vis Internationalized Domain Names and Internationalized Email addresses. Please find the links to those documents over here and here.

Speaking specifically of PHP:

There is one good library available in PHP i.e. EmailValidator. It is an email address validator that includes many validation methods such as DNS validation. The validator specifically recommended is called RFCValidator and validates email addresses against several RFCs. It has good compliance when it comes to being inclusive towards IDNs and Internationalized Email addresses.

Upvotes: 0

Coder12345
Coder12345

Reputation: 3753

I use multi-step validation. As there isn't any perfect way to validate an email address, a perfect one can't be made, but at least you can notify the user he/she is doing something wrong - here is my approach:

  1. I first validate with the very basic regex which just checks if the email contains exactly one @ sign and it is not blank before or after that sign. e.g. /^[^@\s]+@[^@\s]+$/

  2. if the first validator does not pass (and for most addresses it should although it is not perfect), then warn the user the email is invalid and do not allow him/her to continue with the input

  3. if it passes, then validate against a more strict regex - something which might disallow valid emails. If it does not pass, the user is warned about a possible error, but the user is allowed to continue. Unlike step (1) where the user is not allowed to continue because it is an obvious error.

So in other words, the first liberal validation is just to strip obvious errors and it is treated as "error". People type a blank address, address without @ sign and so on. This should be treated as an error. The second one is more strict, but it is treated as a "warning" and the user is allowed to continue with the input, but warned to at least examine if he/she entered a valid entry. The key here is in the error/warning approach - the error being something that can't under 99.99% circumstances be a valid email.

Of course, you can adjust what makes the first regex more liberal and the second one more strict.

Depending on what you need, the above approach might work for you.

Upvotes: 8

Suhaib Janjua
Suhaib Janjua

Reputation: 3574

I always use the below regular expression to validate the email address. It covers all formats of email addresses based on English language characters.

"\A(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?)\Z";

Given below is a C# example:

Add the assembly reference:

using System.Text.RegularExpressions;

and use the below method to pass the email address and get a boolean in return

private bool IsValidEmail(string email) {
    bool isValid = false;
    const string pattern = @"\A(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?)\Z";

    isValid = email != "" && Regex.IsMatch(email, pattern);

    // Same above approach in multiple lines
    //
    //if (!email) {
    //    isValid = false;
    //} else {
    //    // email param contains a value; Pass it to the isMatch method
    //    isValid = Regex.IsMatch(email, pattern);
    //}
    return isValid;
}

This method validates the email string passed in the parameter. It will return false for all cases where param is null, empty string, undefined or the param value is not a valid email address. It will only return true when the param contains a valid email address string.

Upvotes: 4

GooDeeJAY
GooDeeJAY

Reputation: 1820

The regular expression that I use:

[\w-+]+([.][\w]+)?@[\w-+]+([.][a-z]{2,})+

Upvotes: -3

alejandro juarez
alejandro juarez

Reputation: 163

If you need a simple form to validate, you can use the answer of https://regexr.com/3e48o

^[\w-\.]+@([\w-]+\.)+[\w-]{2,4}$

let r = new RegExp(String.raw `^[\w-\.]+@([\w-]+\.)+[\w-]{2,4}$`);

//should be true
console.log(r.test('[email protected]'));
console.log(r.test('[email protected]'));
console.log(r.test('[email protected]'));

//should be false
console.log(r.test('@domain.tld'));
console.log(r.test('[email protected]'));
console.log(r.test('name@domain.'));
console.log(r.test('namedomain.tld'));
console.log(r.test(''));

//now that basic client-side validation is done, send a token from the server side to validate the user actually has access to the email

Upvotes: 1

Asad Ali Choudhry
Asad Ali Choudhry

Reputation: 5261

Although very detailed answers are already added, I think those are complex enough for a developer who is just looking for a simple method to validate an email address or to get all email addresses from a string in Java.

public static boolean isEmailValid(@NonNull String email) {
    return android.util.Patterns.EMAIL_ADDRESS.matcher(email).matches();
}

As per the regular expression is concerned, I always use this regular expression, which works for my problems.

"[A-Z0-9a-z._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,6}"

If you are looking to find all email addresses from a string by matching the email regular expression. You can find a method at this link.

Upvotes: 4

Dave Black
Dave Black

Reputation: 8019

According to RFC 2821 and RFC 2822, the local-part of an email addresses may use any of these ASCII characters:

  1. Uppercase and lowercase letters
  2. The digits 0 through 9
  3. The characters, !#$%&'*+-/=?^_`{|}~
  4. The character "." provided that it is not the first or last character in the local-part.

Matches:

Non-Matches:

For one that is RFC 2821 and 2822 compliant, you can use:

^((([!#$%&'*+\-/=?^_`{|}~\w])|([!#$%&'*+\-/=?^_`{|}~\w][!#$%&'*+\-/=?^_`{|}~\.\w]{0,}[!#$%&'*+\-/=?^_`{|}~\w]))[@]\w+([-.]\w+)*\.\w+([-.]\w+)*)$

Email - RFC 2821, 2822 Compliant

Upvotes: 4

partoftheorigin
partoftheorigin

Reputation: 43

Writing a regular expression for all the things will take a lot of effort. Instead, you can use pyIsEmail package.

Below text is taken from pyIsEmail website.

pyIsEmail is a no-nonsense approach for checking whether that user-supplied email address could be real.

Regular expressions are cheap to write, but often require maintenance when new top-level domains come out or don’t conform to email addressing features that come back into vogue. pyIsEmail allows you to validate an email address – and even check the domain, if you wish – with one simple call, making your code more readable and faster to write. When you want to know why an email address doesn’t validate, they even provide you with a diagnosis.

Usage

For the simplest usage, import and use the is_email function:

from pyisemail import is_email

address = "[email protected]"
bool_result = is_email(address)
detailed_result = is_email(address, diagnose=True)

You can also check whether the domain used in the email is a valid domain and whether or not it has a valid MX record:

from pyisemail import is_email

address = "[email protected]"
bool_result_with_dns = is_email(address, check_dns=True)
detailed_result_with_dns = is_email(address, check_dns=True, diagnose=True)

These are primary indicators of whether an email address can even be issued at that domain. However, a valid response here is not a guarantee that the email exists, merely that is can exist.

In addition to the base is_email functionality, you can also use the validators by themselves. Check the validator source doc to see how this works.

Upvotes: 2

Hany Sakr
Hany Sakr

Reputation: 2929

I converted the code into Java to match the compiler:

String pattern = "(?:[a-zA-Z0-9!#$%&'*+/=?^_`{|}~-]+(?:\\.[a-zA-Z0-9!#$%&'*+/=?^_`{|}~-]+)*|\"(?:[\\x01-\\x08\\x0b\\x0c\\x0e-\\x1f\\x21\\x23-\\x5b\\x5d-\\x7f]|\\\\[\\x01-\\x09\\x0b\\x0c\\x0e-\\x7f])*\")@(?:(?:[a-zA-Z0-9](?:[a-zA-Z0-9-]*[a-zA-Z0-9])?\\.)+[a-zA-Z0-9](?:[a-zA-Z0-9-]*[a-zA-Z0-9])?|\\[(?:(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9]))\\.){3}(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9])|[a-zA-Z0-9-]*[a-zA-Z0-9]:(?:[\\x01-\\x08\\x0b\\x0c\\x0e-\\x1f\\x21-\\x5a\\x53-\\x7f]|\\\\[\\x01-\\x09\\x0b\\x0c\\x0e-\\x7f])+)\\])";

Upvotes: 1

Prassd Nidode
Prassd Nidode

Reputation: 312

List item

I use this function

function checkmail($value) {
    $value = trim($value);
    if (stristr($value,"@") &&
        stristr($value,".") &&
        (strrpos($value, ".") - stripos($value, "@") > 2) &&
        (stripos($value, "@") > 1) &&
        (strlen($value) - strrpos($value, ".") < 6) &&
        (strlen($value) - strrpos($value, ".") > 2) &&
        ($value == preg_replace('/[ ]/', '', $value)) &&
        ($value == preg_replace('/[^A-Za-z0-9\-_.@!*]/', '', $value))
    )
    {

    }
    else {
        return "Invalid Mail-Id";
    }
}

Upvotes: 1

Simon_Weaver
Simon_Weaver

Reputation: 145950

Just about every regular expression I've seen - including some used by Microsoft will not allow the following valid email to get through: [email protected]

I just had a real customer with an email address in this format who couldn't place an order.

Here's what I settled on:

  • A minimal regular expression that won't have false negatives. Alternatively use the MailAddress constructor with some additional checks (see below):
  • Checking for common typos .cmo or .gmial.com and asking for confirmation "Are you sure this is your correct email address. It looks like there may be a mistake." Allow the user to accept what they typed if they are sure.
  • Handling bounces when the email is actually sent and manually verifying them to check for obvious mistakes.

try
{
    var email = new MailAddress(str);

    if (email.Host.EndsWith(".cmo"))
    {
        return EmailValidation.PossibleTypo;
    }

    if (!email.Host.EndsWith(".") && email.Host.Contains("."))
    {
        return EmailValidation.OK;
    }
}
catch
{
    return EmailValidation.Invalid;
}

Upvotes: 4

FlameStorm
FlameStorm

Reputation: 1004

For me the right way for checking email addresses is:

  1. Check that symbol @ exists, and before and after it there are some non-@ symbols: /^[^@]+@[^@]+$/
  2. Try to send an email to this address with some "activation code".
  3. When the user "activated" his/her email address, we will see that all is right.

Of course, you can show some warning or tooltip in front-end when the user typed a "strange" email to help him/her to avoid common mistakes, like no dot in the domain part or spaces in name without quoting and so on. But you must accept the address "hello@world" if user really want it.

Also, you must remember that the email address standard was and can evolve, so you can't just type some "standard-valid" regexp once and for all times. And you must remember that some concrete internet servers can fail some details of common standard and in fact work with own "modified standard".

So, just check @, hint user on frontend and send verification emails on the given address.

Upvotes: 4

Luna
Luna

Reputation: 1488

The HTML5 specification suggests a simple regex for validating email addresses:

/^[a-zA-Z0-9.!#$%&'*+\/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$/

This intentionally doesn't comply with RFC 5322.

Note: This requirement is a wilful violation of RFC 5322, which defines a syntax for e-mail addresses that is simultaneously too strict (before the @ character), too vague (after the @ character), and too lax (allowing comments, whitespace characters, and quoted strings in manners unfamiliar to most users) to be of practical use here.

The total length could also be limited to 254 characters, per RFC 3696 errata 1690.

Upvotes: 26

Ondřej Šotek
Ondřej Šotek

Reputation: 1812

For PHP I'm using the email address validator from the Nette Framework:

/* public static */ function isEmail($value)
{
    $atom = "[-a-z0-9!#$%&'*+/=?^_`{|}~]"; // RFC 5322 unquoted characters in local-part
    $localPart = "(?:\"(?:[ !\\x23-\\x5B\\x5D-\\x7E]*|\\\\[ -~])+\"|$atom+(?:\\.$atom+)*)"; // Quoted or unquoted
    $alpha = "a-z\x80-\xFF"; // Superset of IDN
    $domain = "[0-9$alpha](?:[-0-9$alpha]{0,61}[0-9$alpha])?"; // RFC 1034 one domain component
    $topDomain = "[$alpha](?:[-0-9$alpha]{0,17}[$alpha])?";
    return (bool) preg_match("(^$localPart@(?:$domain\\.)+$topDomain\\z)i", $value);
}

Upvotes: 5

Ramesh Kotkar
Ramesh Kotkar

Reputation: 766

You can use following regular expression for any email address:

^(([^<>()[\]\\.,;:\s@\"]+(\.[^<>()[\]\\.,;:\s@\"]+)*)|(\".+\"))@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\])|(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))$

For PHP

function checkEmailValidation($email)
{
    $expression = '/^(([^<>()[\]\\.,;:\s@\"]+(\.[^<>()[\]\\.,;:\s@\"]+)*)|(\".+\"))@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\])|(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))$/';
    if(preg_match($expression, $email))
    {
        return true;
    }
    else
    {
        return false;
    }
}

For JavaScript

function checkEmailValidation(email)
{
    var pattern = '/^(([^<>()[\]\\.,;:\s@\"]+(\.[^<>()[\]\\.,;:\s@\"]+)*)|(\".+\"))@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\])|(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))$/';
    if(pattern.test(email))
    {
        return true;
    }
    else
    {
        return false;
    }
}

Upvotes: 0

Prasad Bhosale
Prasad Bhosale

Reputation: 722

Following is the regular expression for validating an email address:

^.+@\w+(\.\w+)+$

Upvotes: 3

zıəs uɐɟəʇs
zıəs uɐɟəʇs

Reputation: 1793

As mentioned already, you can't validate an email with a regex. However, here's what we currently use to make sure user-input isn't totally bogus (forgetting the TLD, etc.).

This regex will allow IDN domains and special characters (like Umlauts) before and after the @ sign.

/^[\w.+-_]+@[^.][\w.-]*\.[\w-]{2,63}$/iu

Upvotes: 1

sunleo
sunleo

Reputation: 10943

Java Mail API does magic for us.

try
{
    InternetAddress internetAddress = new InternetAddress(email);
    internetAddress.validate();
    return true;
}
catch(Exception ex)
{
    return false;
}

I got this from here.

Upvotes: 2

McGaz
McGaz

Reputation: 1362

The regular expressions posted for this question are out of date now, because of the new generic top-level domains (gTLDs) coming in (e.g. .london, .basketball, .通販). To validate an email address there are two answers (that would be relevant to the vast majority).

  1. As the main answer says - don't use a regular expression. Just validate it by sending an email to the address (catch exceptions for invalid addresses)
  2. Use a very generic regex to at least make sure that they are using an email structure like {something}@{something}.{something}. There's no point in going for a detailed regex, because you won't catch them all and there'll be a new batch in a few years and you'll have to update your regular expression again.

I have decided to use the regular expression because, unfortunately, some users don't read forms and put the wrong data in the wrong fields. This will at least alert them when they try to put something which isn't an email into the email input field and should save you some time supporting users on email issues.

(.+)@(.+){2,}\.(.+){2,}

Upvotes: 3

Fragment
Fragment

Reputation: 1585

There has nearly been added a new domain, "yandex". Possible emails: [email protected]. And also uppercase letters are supported, so a bit modified version of acrosman's solution is:

^[_a-zA-Z0-9-]+(\.[_a-zA-Z0-9-]+)*@[a-zA-Z0-9-]+(\.[a-zA-Z0-9-]+)*(\.[a-zA-Z]{2,6})$

Upvotes: 2

Alexey Ossikine
Alexey Ossikine

Reputation: 113

If you want to improve on a regex that has been working reasonably well over several years, then the answer depends on what exactly you want to achieve - what kinds of email addresses have been failing. Fine-tuning email regexes is very difficult, and I have yet to see a perfect solution.

  • If your application involves something very technical in nature (or something internal to organizations), then maybe you need to support IP addresses instead of domain names, or comments in the "local" part of the email address.
  • If your application is multinational, I would consider focusing on Unicode and UTF-8 support.

The leading answer to your question currently links to a "fully RFC‑822–compliant regex". However, in spite of the complexity of that regex and its presumed attention to detail in RFC rules, it completely fails when it comes to Unicode support.

The regex that I've written for most of my applications focuses on Unicode support, as well as reasonably good overall adherence to RFC standards:

/^(?!\.)((?!.*\.{2})[a-zA-Z0-9\u0080-\u00FF\u0100-\u017F\u0180-\u024F\u0250-\u02AF\u0300-\u036F\u0370-\u03FF\u0400-\u04FF\u0500-\u052F\u0530-\u058F\u0590-\u05FF\u0600-\u06FF\u0700-\u074F\u0750-\u077F\u0780-\u07BF\u07C0-\u07FF\u0900-\u097F\u0980-\u09FF\u0A00-\u0A7F\u0A80-\u0AFF\u0B00-\u0B7F\u0B80-\u0BFF\u0C00-\u0C7F\u0C80-\u0CFF\u0D00-\u0D7F\u0D80-\u0DFF\u0E00-\u0E7F\u0E80-\u0EFF\u0F00-\u0FFF\u1000-\u109F\u10A0-\u10FF\u1100-\u11FF\u1200-\u137F\u1380-\u139F\u13A0-\u13FF\u1400-\u167F\u1680-\u169F\u16A0-\u16FF\u1700-\u171F\u1720-\u173F\u1740-\u175F\u1760-\u177F\u1780-\u17FF\u1800-\u18AF\u1900-\u194F\u1950-\u197F\u1980-\u19DF\u19E0-\u19FF\u1A00-\u1A1F\u1B00-\u1B7F\u1D00-\u1D7F\u1D80-\u1DBF\u1DC0-\u1DFF\u1E00-\u1EFF\u1F00-\u1FFFu20D0-\u20FF\u2100-\u214F\u2C00-\u2C5F\u2C60-\u2C7F\u2C80-\u2CFF\u2D00-\u2D2F\u2D30-\u2D7F\u2D80-\u2DDF\u2F00-\u2FDF\u2FF0-\u2FFF\u3040-\u309F\u30A0-\u30FF\u3100-\u312F\u3130-\u318F\u3190-\u319F\u31C0-\u31EF\u31F0-\u31FF\u3200-\u32FF\u3300-\u33FF\u3400-\u4DBF\u4DC0-\u4DFF\u4E00-\u9FFF\uA000-\uA48F\uA490-\uA4CF\uA700-\uA71F\uA800-\uA82F\uA840-\uA87F\uAC00-\uD7AF\uF900-\uFAFF\.!#$%&'*+-/=?^_`{|}~\-\d]+)@(?!\.)([a-zA-Z0-9\u0080-\u00FF\u0100-\u017F\u0180-\u024F\u0250-\u02AF\u0300-\u036F\u0370-\u03FF\u0400-\u04FF\u0500-\u052F\u0530-\u058F\u0590-\u05FF\u0600-\u06FF\u0700-\u074F\u0750-\u077F\u0780-\u07BF\u07C0-\u07FF\u0900-\u097F\u0980-\u09FF\u0A00-\u0A7F\u0A80-\u0AFF\u0B00-\u0B7F\u0B80-\u0BFF\u0C00-\u0C7F\u0C80-\u0CFF\u0D00-\u0D7F\u0D80-\u0DFF\u0E00-\u0E7F\u0E80-\u0EFF\u0F00-\u0FFF\u1000-\u109F\u10A0-\u10FF\u1100-\u11FF\u1200-\u137F\u1380-\u139F\u13A0-\u13FF\u1400-\u167F\u1680-\u169F\u16A0-\u16FF\u1700-\u171F\u1720-\u173F\u1740-\u175F\u1760-\u177F\u1780-\u17FF\u1800-\u18AF\u1900-\u194F\u1950-\u197F\u1980-\u19DF\u19E0-\u19FF\u1A00-\u1A1F\u1B00-\u1B7F\u1D00-\u1D7F\u1D80-\u1DBF\u1DC0-\u1DFF\u1E00-\u1EFF\u1F00-\u1FFF\u20D0-\u20FF\u2100-\u214F\u2C00-\u2C5F\u2C60-\u2C7F\u2C80-\u2CFF\u2D00-\u2D2F\u2D30-\u2D7F\u2D80-\u2DDF\u2F00-\u2FDF\u2FF0-\u2FFF\u3040-\u309F\u30A0-\u30FF\u3100-\u312F\u3130-\u318F\u3190-\u319F\u31C0-\u31EF\u31F0-\u31FF\u3200-\u32FF\u3300-\u33FF\u3400-\u4DBF\u4DC0-\u4DFF\u4E00-\u9FFF\uA000-\uA48F\uA490-\uA4CF\uA700-\uA71F\uA800-\uA82F\uA840-\uA87F\uAC00-\uD7AF\uF900-\uFAFF\-\.\d]+)((\.([a-zA-Z\u0080-\u00FF\u0100-\u017F\u0180-\u024F\u0250-\u02AF\u0300-\u036F\u0370-\u03FF\u0400-\u04FF\u0500-\u052F\u0530-\u058F\u0590-\u05FF\u0600-\u06FF\u0700-\u074F\u0750-\u077F\u0780-\u07BF\u07C0-\u07FF\u0900-\u097F\u0980-\u09FF\u0A00-\u0A7F\u0A80-\u0AFF\u0B00-\u0B7F\u0B80-\u0BFF\u0C00-\u0C7F\u0C80-\u0CFF\u0D00-\u0D7F\u0D80-\u0DFF\u0E00-\u0E7F\u0E80-\u0EFF\u0F00-\u0FFF\u1000-\u109F\u10A0-\u10FF\u1100-\u11FF\u1200-\u137F\u1380-\u139F\u13A0-\u13FF\u1400-\u167F\u1680-\u169F\u16A0-\u16FF\u1700-\u171F\u1720-\u173F\u1740-\u175F\u1760-\u177F\u1780-\u17FF\u1800-\u18AF\u1900-\u194F\u1950-\u197F\u1980-\u19DF\u19E0-\u19FF\u1A00-\u1A1F\u1B00-\u1B7F\u1D00-\u1D7F\u1D80-\u1DBF\u1DC0-\u1DFF\u1E00-\u1EFF\u1F00-\u1FFF\u20D0-\u20FF\u2100-\u214F\u2C00-\u2C5F\u2C60-\u2C7F\u2C80-\u2CFF\u2D00-\u2D2F\u2D30-\u2D7F\u2D80-\u2DDF\u2F00-\u2FDF\u2FF0-\u2FFF\u3040-\u309F\u30A0-\u30FF\u3100-\u312F\u3130-\u318F\u3190-\u319F\u31C0-\u31EF\u31F0-\u31FF\u3200-\u32FF\u3300-\u33FF\u3400-\u4DBF\u4DC0-\u4DFF\u4E00-\u9FFF\uA000-\uA48F\uA490-\uA4CF\uA700-\uA71F\uA800-\uA82F\uA840-\uA87F\uAC00-\uD7AF\uF900-\uFAFF]){2,63})+)$/i

I'll avoid copy-pasting complete answers, so I'll just link this to a similar answer I provided here: How to validate a unicode email?

There is also a live demo available for the regex above at: http://jsfiddle.net/aossikine/qCLVH/3/

Upvotes: 3

Related Questions