AnApprentice
AnApprentice

Reputation: 110950

How to build a method to validate emails

I'd like to create a method like so:

def email_is_junk(email_address)
end

Where it returns true if the email junk, false if the email is not junk... Tricky part is I want that logic to be based off conditions like the following:

[email protected]

Suggestions on how to write this method w/o requiring dozens of if blocks with regex?

Upvotes: 1

Views: 234

Answers (3)

the Tin Man
the Tin Man

Reputation: 160551

Look at Ruby's Regexp.union and Regexp.escape methods. They make it easy to generate regex patterns based on text or regex strings.

This is from the union docs:

Return a Regexp object that is the union of the given patterns, i.e., will match any of its parts. The patterns can be Regexp objects, in which case their options will be preserved, or Strings. If no patterns are given, returns /(?!)/. The behavior is unspecified if any given pattern contains capture.

Regexp.union                         #=> /(?!)/
Regexp.union("penzance")             #=> /penzance/
Regexp.union("a+b*c")                #=> /a\+b\*c/
Regexp.union("skiing", "sledding")   #=> /skiing|sledding/
Regexp.union(["skiing", "sledding"]) #=> /skiing|sledding/
Regexp.union(/dogs/, /cats/i)        #=> /(?-mix:dogs)|(?i-mx:cats)/

And from the escape docs:

Escapes any characters that would have special meaning in a regular expression. Returns a new escaped string, or self if no characters are escaped. For any string, Regexp.new(Regexp.escape(str))=~str will be true.

Regexp.escape('\*?{}.')   #=> \\\*\?\{\}\.

This is a starting point:

patterns = [
  /.+?\+.+?@/
]  

strings = [
    'do-not-reply', 'support', 'test', 'service', 'tips', 'twitter', 'alerts', 'survey',
    'craigslist.org'
]

regex = Regexp.union(
  *patterns,
  *strings.map{ |s|
    Regexp.new( Regexp.escape("#{ s }@"), Regexp::IGNORECASE ) }
)
pp regex

>> /(?-mix:.+?\+.+?@)|(?i-mx:do\-not\-reply@)|(?i-mx:support@)|(?i-mx:test@)|(?i-mx:service@)|(?i-mx:tips@)|(?i-mx:twitter@)|(?i-mx:alerts@)|(?i-mx:survey@)|(?i-mx

Applying the above:

sample_email_addresses = %w[
    user
    user+foo
    do-not-reply
    support
    service
    tips
    twitter
    alerts
    survey
].map{ |e| e << '@host.com' }

pp sample_email_addresses.map{ |e| [e, !!e[regex]] }

>> [["[email protected]", false],
>> ["[email protected]", true],
>> ["[email protected]", true],
>> ["[email protected]", true],
>> ["[email protected]", true],
>> ["[email protected]", true],
>> ["[email protected]", true],
>> ["[email protected]", true],
>> ["[email protected]", true]]

The output shows a list containing each tested address. true means they triggered a hit in the regex, meaning there was something wrong, and false means they were clean and considered safe.

If you only want the ones that failed, i.e., matched the regex:

pp sample_email_addresses.select{ |e| e[regex] }

>> ["[email protected]",
>>  "[email protected]",
>>  "[email protected]",
>>  "[email protected]",
>>  "[email protected]",
>>  "[email protected]",
>>  "[email protected]",
>>  "[email protected]"]

If you only want the ones that passed, i.e., didn't trigger a hit in the regex:

pp sample_email_addresses.reject{ |e| e[regex] }

>> ["[email protected]"]

Upvotes: 1

Unixmonkey
Unixmonkey

Reputation: 18784

As an illustration to Zabba's comment above:

USER_RULES = ['\+', 'do-not-reply', 'support', 'test', 'service', 'tips', 'twitter', 'alerts', 'survey']
DOMAIN_RULES = ['craigslist.org']

def email_is_junk(email)
  return true if !email.match('@') # return early if no @
  user, domain = email.split('@')
  USER_RULES.each   { |rule| return true if user.match(rule)   }
  DOMAIN_RULES.each { |rule| return true if domain.match(rule) }
  false # reached the end without matching anything
end

Upvotes: 2

ndp
ndp

Reputation: 21996

Here's a Javascript version. Not sure it can be much simpler than:

function isJunk(email) {
  return hasPlus(email) || supportLike(email) || craigsList(email);
}

function craigsList(email) {
  return email.match(/@craigslist\.org/);
}

function supportLike(email) {
  return email.match(/do-not-reply|support|test|service|tips|twitter|alerts|survey/);
}

function hasPlus(email) {
  return email.match(/\+.*@/);
}

This is only a heuristic, so it's not 100% accurate. If you still have problems, consider the verification by sending the user an email with a token in it.

Upvotes: 0

Related Questions