Ben
Ben

Reputation: 712

regexp and rails validations

I have two customs validations :

  def validate_email
    regexp = "[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]+"
    if sleep_email.present? && !sleep_email.match(regexp)
      errors.add(:sleep_email, "l'email indiqué semble ne pas avoir le bon format")
    end
  end

  def validate_website
    regexp = "(http:\/\/www\.|https:\/\/www\.|http:\/\/|https:\/\/)?[a-z0-9]+([\-\.]{1}[a-z0-9]+)*\.[a-z]{2,5}(:[0-9]{1,5})?(\/.*)?"
    if website.present? && !website.match(regexp)
      errors.add(:website, "l'url de votre site web doit avoir la forme de http://votresite.com")
    end
  end

But yo@yo and http://website are valids. What's wrong ?

Upvotes: 0

Views: 73

Answers (2)

Schwern
Schwern

Reputation: 164739

You're building regexes using strings. Strings and regexes have different quoting. You're effectively double escaping. Things like \. are turned into a plain ..

# This results in the regex /a.c/
p "abc".match?("a\.c")  # true

# This results in the desired regex /a\.c/
p "abc".match?("a\\.c")  # true

# This avoids the string escaping entirely.
p "abc".match?(%r{a\.c})  # false

To avoid this double escaping, use /.../ or %r{...} to create regexes.


Don't try to validate email with a regex. Instead, use the validates_email_format_of gem which provides a proper validator you can use on any attribute.

validates :sleep_email, presence: true, email_format: true

If you want to see how to fully validate an email address, look at the source.


Your URL regex does work.

regexp = "(http:\/\/www\.|https:\/\/www\.|http:\/\/|https:\/\/)?[a-z0-9]+([\-\.]{1}[a-z0-9]+)*\.[a-z]{2,5}(:[0-9]{1,5})?(\/.*)?"
p "http://website".match?(regexp)  # true

http://website is valid URL syntax. It's not URL's job to check the validity of the host.

If you also want to validate parts of the URL your regex will get increasingly complex. Instead, parse the URL with URI and then check its individual pieces as you like.

Here's a custom validator I whipped up which parse the URI, checks it's an allowed scheme, and does a very rudimentary check on the host.

class UrlValidator < ActiveModel::EachValidator
  ALLOWED_SCHEMES = ['http', 'https']

  private def allowed_schemes
    options[:allowed_schemes] || ALLOWED_SCHEMES
  end

  def validates_each(record, attribute, value)
    uri = URI(value)

    if !allowed_schemes.include?(uri.scheme)
      record.errors.add(attribute, :scheme_not_allowed, message: "Scheme #{uri.scheme} is not allowed")
    end

    # Has to have at least xxx.yyy
    # This is a pretty sloppy host check.
    if !uri.host.match?(/\w+\.\w+/)
      record.errors.add(attribute, :host_not_allowed, message: "Host #{uri.host} is not allowed")
    end
  rescue URI::Error
    record.errors.add(attribute, :not_a_uri)
  end
end
validates :website, url: true

If you wanted to allow other schemes, like ftp...

validates :website, url: { allowed_schemes: ['http', 'https', 'ftp'] }

If you wanted true domain validation, you could add a DNS lookup.

  begin
    Resolv::DNS.open do |dns|
      dns.getaddress(uri.host) }
    end
  rescue Resolv::ResolvError
    record.errors.add(attribute, :invalid_host, { message: "#{uri.host} could not be resolved" }
  end

However, this lookup has a performance impact.

Upvotes: 2

DPAMonty
DPAMonty

Reputation: 181

The standard email regex (RFC 5322 Official Standard) to use is:

(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])

As for the website URL, use this one. The URL will only be valid if the TLD (.com, .net, etc.) is included.

^(http|https):\/\/[a-z0-9]+([\-\.]{1}[a-z0-9]+)*\.[a-z]{2,5}(:[0-9]{1,5})?(\/.*)?$

Upvotes: 1

Related Questions