user3277633
user3277633

Reputation: 1923

rails email validation format and regex

Currently following the Michael Hartl rails tutorial

Given the following tests in rails

  test "email validation should accept valid addresses" do
    valid_addresses = %w[[email protected] [email protected] [email protected]
                         [email protected] [email protected]]
    valid_addresses.each do |valid_address|
      @user.email = valid_address
      assert @user.valid?, "#{valid_address.inspect} should be valid"
    end
  end

  test "email validation should reject invalid addresses" do
    invalid_addresses = %w[user@example,com user_at_foo.org user.name@example.
                           foo@bar_baz.com foo@bar+baz.com]
    invalid_addresses.each do |invalid_address|
      @user.email = invalid_address
      assert_not @user.valid?, "#{invalid_address.inspect} should be invalid"
    end
  end

and the following regex for email format validation

VALID_EMAIL_REGEX = /\A[\w+\-.]+@[a-z\d\-.]+\.[a-z]+\z/i
validates :email, presence: true, format: { with: VALID_EMAIL_REGEX }

Can someone explain to me what the tests are testing with respect to the regex? Why are the valid tests only [email protected], [email protected], and so on. What if i add another element to valid_addresses that's [email protected]. Why did Michael specifically choose the above 5 example emails as valid_addresses and 5 invalid_addresses?

If the regex tests for all formats and only returns a specific one, why do we need to test at all?

Upvotes: 1

Views: 5074

Answers (3)

S.Klatt
S.Klatt

Reputation: 21

For everyone checking in, in 2023.

You can use:

validates :email, format: { with: URI::MailTo::EMAIL_REGEXP }

Upvotes: 1

jyrkim
jyrkim

Reputation: 2869

I think the best way of trying to get accustomed with regular expressions is to experiment with different regular expressions. If you try to use Rubular.com (like recommended in the book) and paste: \A[\w+\-.]+@[a-z\d\-.]+\.[a-z]+\z in the regular expression part. The letter i comes in the the text box following the regular expression. Then if you paste email address in the test string part: user@example,com you'll notice that the email address does not match, but if you replace the comma with a dot, then it'll match. The 2nd incorrect email address just tests that the character @ is included (which is missing in this case).

3rd incorrect email address tests that the suffix contains 1 or more letters. 4th incorrect email address tests that there are no underscores after @ in the email address. 5th incorrect email address tests that there isn't + character after @ in the email address.

The correct email addresses basically test the same things, but in those email addresses underscores and plus signs are in the right part of the email address. It also tests that [email protected] email address is saved in the the User model lower case: before_save { self.email = email.downcase } If that did not happen, it would not be a valid email address in the test.

Upvotes: 0

Sam
Sam

Reputation: 20486

Let us break down the expression (keep in mind the i modifier makes it case insensitive):

\A          (?# anchor to the beginning of the string)
[\w+\-.]+   (?# match 1+ a-z, A-Z, 0-9, +, _, -, or .)
@           (?# match literal @)
[a-z\d\-.]+ (?# match 1+ a-z, 0-9, -, or .)
\.          (?# match literal .)
[a-z]+      (?# match 1+ a-z)
\z          (?# anchor to the absolute end of the string)

This is what the tutorial defines as an email (in reality, it's much more complicated). So the author, Michael Hartl, wrote a couple tests for "valid" and "invalid" (according to the above definitions) emails.

Pretty much the "user" can be alphanumeric or contain _+-.. The "domain" can be alphanumeric or -.. And the "TLD" can only be letters. The first 5 emails use many variations of these previous rules as "acceptable" emails. The last 5 emails fail for the following reasons:

  • user@example,com - , can't be matched
  • user_at_foo.org - no @
  • user.name@example. - no TLD after .
  • foo@bar_baz.com - domain can't contain _
  • foo@bar+baz.com - domain can't contain +

Obviously if you want more specific emails to match (or not match) add them to the array of tests. If your test fails, you know you will need to update your expression :)

Upvotes: 2

Related Questions