T0ny lombardi
T0ny lombardi

Reputation: 1920

Ruby Email validation with regex

I have a large list of emails I am running through. A lot of the emails have typos. I am trying to build a string that will check valid emails.

this is what I have for regex.

def is_a_valid_email?(email)
  (email =~ /^(([A-Za-z0-9]*\.+*_+)|([A-Za-z0-9]+\-+)|([A-Za-z0-9]+\+)|([A-Za-z0-9]+\+))*[A-Z‌​a-z0-9]+@{1}((\w+\-+)|(\w+\.))*\w{1,63}\.[a-zA-Z]{2,4}$/i)
end

It passes if an email as underscores and only one period. I have a lot of emails that have more then one periods in the name itself. How do I check that in regex.

[email protected] # <~~ valid
foo.bar#gmail.co.uk # <~~~ not valid
[email protected] # <~~~valid 
[email protected] # <~~ not valid 
get_at_m.e@gmail  #<~~ valid

Can someone help me rewrite my regex ?

Upvotes: 78

Views: 88753

Answers (16)

Joshua Hunter
Joshua Hunter

Reputation: 4458

This has been built into the standard library since at least 2.2.1

URI::MailTo::EMAIL_REGEXP

*warning the above considers a@b to be a valid email address.

Upvotes: 145

Victor Hazbun
Victor Hazbun

Reputation: 11

Stop validating emails with regex. Instead send an email with some secret token.

Ultimately if you really need to validate, use HTML5 email validation. Example running on Regex101.

Or use the most minimalistic regex

/@/

Upvotes: 1

Bhargav Thummar
Bhargav Thummar

Reputation: 1

To get a valid email id we use a regular expression

/\A[a-zA-Z0-9.!\#$%&'*+\/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*\z/

Upvotes: 0

noraj
noraj

Reputation: 4622

TL;DR

Any custom regexp you'll find on internet, including URI::MailTo::EMAIL_REGEXP, is wrong.

Here what you should use:

# The closest thing to RFC_5322
RFC_5322 = /\A(?:[a-z0-9!#$%&'*+\/=?^_‘{|}~-]+(?:\.[a-z0-9!#$%&'*+\/=?^_‘{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])\z/i

# Lighter more practical version RFC_5322 that will be more useful in real life
RFC_5322_light = /\A[a-z0-9!#$%&'*+\/=?^_‘{|}~-]+(?:\.[a-z0-9!#$%&'*+\/=?^_‘{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\z/i

# Same as the light version but with length limit enforcing
RFC_5322_with_length = /\A(?=[a-z0-9@.!#$%&'*+\/=?^_‘{|}~-]{6,254}\z)(?=[a-z0-9.!#$%&'*+\/=?^_‘{|}~-]{1,64}@)[a-z0-9!#$%&'*+\/=?^_‘{|}~-]+(?:\.[a-z0-9!#$%&'*+\/=?^_‘{|}~-]+)*@(?:(?=[a-z0-9-]{1,63}\.)[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+(?=[a-z0-9-]{1,63}\z)[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\z/i

Details

The last RFC defining email address format is RFC5322 - Internet Message Format.

You can check the section 3.4.1. Addr-Spec Specification. If we only look at the first part, the @ split the local part (on the left) and the domain (on the right).

addr-spec = local-part "@" domain

local-part = dot-atom / quoted-string / obs-local-part

For example, the local part, can contain a dot-atom or a quoted-string defined here:

It's a bit complex but your email address can contain many ASCII special character that are excluded of many regexp (like #, $, &, etc.).

On the other hand, URI::MailTo::EMAIL_REGEXP is defined in ruby/lib/uri/mailto.rb with the following regexp:

EMAIL_REGEXP = /\A[a-zA-Z0-9.!\#$%&'*+\/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*\z/

The comment above this regexp suggest they followed the recommendations at https://html.spec.whatwg.org/multipage/input.html#valid-e-mail-address.

A valid email address is a string that matches the email production of the following ABNF, the character set for which is Unicode. This ABNF implements the extensions described in RFC 1123. [ABNF] [RFC5322] [RFC1034] [RFC1123]

But WHATWG spec add the following comment, which is very important:

This requirement is a willful violation of RFC 5322, which defines a syntax for email addresses that is simultaneously too strict (before the "@" character), too vague (after the "@" character), and too lax (allowing comments, whitespace characters, and quoted strings in manners unfamiliar to most users) to be of practical use here.

So WHATWG is telling us they didn't respect the RFC that was standardizing the email address format. They say the domain part is too vague in RFC 5322 but RFC 5322 gives this note to tell use we have to check other RFCs for a more complete domain format spec:

Note: A liberal syntax for the domain portion of addr-spec is given here. However, the domain portion contains addressing information specified by and used in other protocols (e.g., [RFC1034], [RFC1035], [RFC1123], [RFC5321]). It is therefore incumbent upon implementations to conform to the syntax of addresses for the context in which they are used.

WHATWG also tells us that the local-part in RFC 5322 is too strict. But look at URI::MailTo::EMAIL_REGEXP that follows WHATWG spec instead:

URI::MailTo::EMAIL_REGEXP.match?('[email protected]') # => true
URI::MailTo::EMAIL_REGEXP.match?('-@z') # => true
URI::MailTo::EMAIL_REGEXP.match?('++++++++.........@z') # => true

In the contrary WHATWG spec (and so URI::MailTo::EMAIL_REGEXP) is way too lax.

So I found at https://emailregex.com/ a General Email Regex (RFC 5322 Official Standard) (see summary).

The explanation and alternatives can be found at https://www.regular-expressions.info/email.html.

# Blind RFC 5322
\A(?:[a-z0-9!#$%&'*+/=?^_‘{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_‘{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])\z

# RFC 5322, practical version (omit IP addresses, domain-specific addresses, the syntax using double quotes and square brackets)
\A[a-z0-9!#$%&'*+/=?^_‘{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_‘{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\z

# RFC 5322, practical version (similar as previous + length limits enfocing)
\A(?=[a-z0-9@.!#$%&'*+/=?^_‘{|}~-]{6,254}\z)(?=[a-z0-9.!#$%&'*+/=?^_‘{|}~-]{1,64}@)[a-z0-9!#$%&'*+/=?^_‘{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_‘{|}~-]+)*@(?:(?=[a-z0-9-]{1,63}\.)[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+(?=[a-z0-9-]{1,63}\z)[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\z

And as you can see on the screenshot below none of addresses accepted by WHATWG / URI::MailTo::EMAIL_REGEXP is valid.

invalid email addresses

Let's do the same thing locally:

RFC_5322 = /\A(?:[a-z0-9!#$%&'*+/=?^_‘{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_‘{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])\z/i

Now we can compare both (on Ruby 3.2.0):

# WHATWG
## Invalid cases
URI::MailTo::EMAIL_REGEXP.match?('[email protected]') # => true
URI::MailTo::EMAIL_REGEXP.match?('-@z') # => true
URI::MailTo::EMAIL_REGEXP.match?('++++++++.........@z') # => true
URI::MailTo::EMAIL_REGEXP.match?('invalí[email protected]') # => false
URI::MailTo::EMAIL_REGEXP.match?('invalid%$£"@domain.com') # => false
URI::MailTo::EMAIL_REGEXP.match?('invalid£@domain.com') # => false
URI::MailTo::EMAIL_REGEXP.match?('invali"[email protected]') # => false
URI::MailTo::EMAIL_REGEXP.match?('[email protected]') # => true
URI::MailTo::EMAIL_REGEXP.match?('!#$%’*+-/=?^_`{|}[email protected]') # => false
## Valid cases
URI::MailTo::EMAIL_REGEXP.match?('[email protected]') # => true
URI::MailTo::EMAIL_REGEXP.match?('[email protected]') # => true
URI::MailTo::EMAIL_REGEXP.match?('[email protected]') # => true
URI::MailTo::EMAIL_REGEXP.match?('[email protected]') # => true
URI::MailTo::EMAIL_REGEXP.match?('valid%[email protected]') # => true
URI::MailTo::EMAIL_REGEXP.match?('"valid"@domain.com') # crash with error NameError

# RFC 5322
## Invalid cases
RFC_5322.match?('[email protected]') # => false
RFC_5322.match?('-@z') # => false
RFC_5322.match?('++++++++.........@z') # => false
RFC_5322.match?('invalí[email protected]') # => false
RFC_5322.match?('invalid%$£"@domain.com') # => false
RFC_5322.match?('invalid£@domain.com') # => false
RFC_5322.match?('invali"[email protected]') # => false
RFC_5322.match?('[email protected]') # => false
RFC_5322.match?('!#$%’*+-/=?^_`{|}[email protected]') # => false
## Valid cases
RFC_5322.match?('[email protected]') # => true
RFC_5322.match?('[email protected]') # => true
RFC_5322.match?('[email protected]') # => true
RFC_5322.match?('[email protected]') # => true
RFC_5322.match?('valid%[email protected]') # => true
RFC_5322.match?('"valid"@domain.com') # => true

# RFC 5322 light (same results with RFC_5322_with_length)
## Invalid cases
RFC_5322_light.match?('[email protected]') # => false
RFC_5322_light.match?('-@z') # => false
RFC_5322_light.match?('++++++++.........@z') # => false
RFC_5322_light.match?('invalí[email protected]') # => false
RFC_5322_light.match?('invalid%$£"@domain.com') # => false
RFC_5322_light.match?('invalid£@domain.com') # => false
RFC_5322_light.match?('invali"[email protected]') # => false
RFC_5322_light.match?('[email protected]') # => false
RFC_5322_light.match?('!#$%’*+-/=?^_`{|}[email protected]') # => false
## Valid cases
RFC_5322_light.match?('[email protected]') # => true
RFC_5322_light.match?('[email protected]') # => true
RFC_5322_light.match?('[email protected]') # => true
RFC_5322_light.match?('[email protected]') # => true
RFC_5322_light.match?('valid%[email protected]') # => true
RFC_5322_light.match?('"valid"@domain.com') # => false (difference with "pure" version)

Warning this test is no complete and does not cover all cases.

Upvotes: 6

VENKATESH KARNI
VENKATESH KARNI

Reputation: 66

Ruby Multiple Emails validation with regex in the controller

emails = [email protected],[email protected],etc...
unless emails =~ /\A([^@\s]+)@((?:[-a-z0-9]+\.)+[a-z]{2,})\Z/i
   flash[:error] = "Invalid emails"
else
   Here send invitation and create users
end

Upvotes: 0

sandre89
sandre89

Reputation: 5898

The accepted answer suggest using URI::MailTo::EMAIL_REGEXP.

However, this regexp considers 1234@1234 as a valid e-mail address, which is something you probably don't want in a real life app (for instance, AWS SES will throw an exception if you try to send an e-mail to an address like this).

As Darpan points out in the comments, you can simply change the trailing ? in that regexp with +, and it will work as expected. The resulting regex is:

/\A[a-zA-Z0-9.!\#$%&'*+\/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)+\z/

Since the original URI::MailTo regexp, whilst technically valid according to the spec, is imho useless for our needs, we "fix" it in the Devise initializer.

# in config/initializers/devise.rb, put this at the beginning of the file
URI::MailTo.send(:remove_const, :EMAIL_REGEXP)
URI::MailTo.const_set(:EMAIL_REGEXP, /\A[a-zA-Z0-9.!\#$%&'*+\/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)+\z/)

# And then find `config.email_regexp` (it will already be there in the file) and change it to:
config.email_regexp = URI::MailTo::EMAIL_REGEXP

If you're wondering why this monkeypatch isn't put in a separate initializer file, you'd have to name the initializer file as 00_xxx.rb to make it load before the devise initializer. This is against Rails docs recommendations, which actually suggests you use a single initializer for cases like this:

If an initializer has code that relies on code in another initializer, you can combine them into a single initializer instead. This makes the dependencies more explicit, and can help surface new concepts within your application. Rails also supports numbering of initializer file names, but this can lead to file name churn.

Upvotes: 9

diabolist
diabolist

Reputation: 4099

Use

/\A[a-zA-Z0-9.!\#$%&'*+\/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)+\z/

Explanation below.

Whilst Joshua Hunter's answer is great URI::MailTo::EMAIL_REGEXP has a significant flaw in my opinion.

It matches fred@example which cause Net::SMTPSyntaxError: 501 5.1.3 Bad recipient address syntax errors.

URI::MailTo::EMAIL_REGEXP evaluates to

/\A[a-zA-Z0-9.!\#$%&'*+\/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*\z/

Changing the last star to a plus makes it better.

Note: this is pointed out in Darpan's comment to Joshua Hunter's answer, but I think it deserves its own answer to make it more visible.

Upvotes: 2

jeffdill2
jeffdill2

Reputation: 4114

If you're using Devise, you can also use their included regex via:

Devise.email_regexp

which returns:

/\A[^@\s]+@[^@\s]+\z/

Upvotes: 2

Mike H-R
Mike H-R

Reputation: 7815

TL;DR:

credit goes to @joshuahunter (below, upvote his answer). Included here so that people see it.

URI::MailTo::EMAIL_REGEXP

Old TL;DR

VALID_EMAIL_REGEX = /\A[\w+\-.]+@[a-z\d\-]+(\.[a-z\d\-]+)*\.[a-z]+\z/i

Original answer

You seem to be complicating things a lot, I would simply use:

VALID_EMAIL_REGEX = /\A[\w+\-.]+@[a-z\d\-]+(\.[a-z]+)*\.[a-z]+\z/i

which is taken from michael hartl's rails book

since this doesn't meet your dot requirement it can simply be ammended like so:

VALID_EMAIL_REGEX = /\A([\w+\-]\.?)+@[a-z\d\-]+(\.[a-z]+)*\.[a-z]+\z/i

As mentioned by CAustin, there are many other solutions.

EDIT:

it was pointed out by @installero that the original fails for subdomains with hyphens in them, this version will work (not sure why the character class was missing digits and hyphens in the first place).

VALID_EMAIL_REGEX = /\A[\w+\-.]+@[a-z\d\-]+(\.[a-z\d\-]+)*\.[a-z]+\z/i

Upvotes: 127

Kiry Meas
Kiry Meas

Reputation: 1242

try this!!!

/\[A-Z0-9._%+-\]+@\[A-Z0-9.-\]+\.\[AZ\]{2,4}/i

only email string selected

"Robert Donhan" <[email protected]>sadfadf
Robert Donhan <[email protected]>
"Robert Donhan" [email protected]
Robert Donhan [email protected]

Upvotes: -1

kaikuchn
kaikuchn

Reputation: 795

Nowadays Ruby provides an email validation regexp in its standard library. You can find it in the URI::MailTo module, it's URI::MailTo::EMAIL_REGEXP. In Ruby 2.4.1 it evaluates to

/\A[a-zA-Z0-9.!\#$%&'*+\/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*\z/

But I'd just use the constant itself.

Upvotes: 15

bdbasinger
bdbasinger

Reputation: 175

Yours is complicated indeed.

VALID_EMAIL_REGEX = /\A[\w+\-.]+@[a-z\d\-.]+\.[a-z]+\z/i

The above code should suffice.

Explanation of each piece of the expression above for clarification:

Start of regex:

/

Match the start of a string:

\A

At least one word character, plus, hyphen, or dot:

[\w+\-.]+

A literal "at sign":

@

A literal dot:

\.

At least one letter:

[a-z]+

Match the end of a string:

\z

End of regex:

/

Case insensitive:

i

Putting it back together again:

/\A[\w+\-.]+@[a-z\d\-.]+\.[a-z]+\z/i

Check out Rubular to conveniently test your expressions as you write them.

Upvotes: 2

ilgam
ilgam

Reputation: 4420

This one is more short and safe:

/\A[^@\s]+@[^@\s]+\z/

The regular is used in Devise gem. But it has some vulnerabilities for these values:

  ".....@a....",
  "david.gilbertson@SOME+THING-ODD!!.com",
  "a.b@example,com",
  "a.b@example,co.de"

I prefer to use regexp from the ruby library URI::MailTo::EMAIL_REGEXP

There is a gem for email validations

Email Validator

Upvotes: 21

Michele Riva
Michele Riva

Reputation: 562

This works good for me:

if email.match?('[a-z0-9]+[_a-z0-9\.-]*[a-z0-9]+@[a-z0-9-]+(\.[a-z0-9-]+)*(\.[a-z]{2,4})')
      puts 'matches!'
else
      puts 'it doesn\'t match!'
end

Upvotes: -1

installero
installero

Reputation: 9766

I guess the example from the book can be improved to match emails with - in subdomain.

VALID_EMAIL_REGEX = /\A[\w+\-.]+@[a-z\d\-]+(\.[a-z\d\-]+)*\.[a-z]+\z/i

For example:

> '[email protected]' =~ VALID_EMAIL_REGEX
=> 0

Upvotes: 5

John Carney
John Carney

Reputation: 639

Here's a great article by David Celis explaining why every single regular expression you can find for validating email addresses is wrong, including the ones above posted by Mike.

From the article:

The local string (the part of the email address that comes before the @) can contain the following characters:

    `! $ & * - = ` ^ | ~ # % ' + / ? _ { }` 

But guess what? You can use pretty much any character you want if you escape it by surrounding it in quotes. For example, "Look at all these spaces!"@example.com is a valid email address. Nice.

If you need to do a basic check, the best regular expression is simply /@/.

Upvotes: 27

Related Questions