Reputation: 110950
I'd like to create a method like so:
def email_is_junk(email_address)
end
Where it returns true if the email junk, false if the email is not junk... Tricky part is I want that logic to be based off conditions like the following:
Suggestions on how to write this method w/o requiring dozens of if blocks with regex?
Upvotes: 1
Views: 234
Reputation: 160551
Look at Ruby's Regexp.union
and Regexp.escape
methods. They make it easy to generate regex patterns based on text or regex strings.
This is from the union
docs:
Return a Regexp object that is the union of the given patterns, i.e., will match any of its parts. The patterns can be Regexp objects, in which case their options will be preserved, or Strings. If no patterns are given, returns /(?!)/. The behavior is unspecified if any given pattern contains capture.
Regexp.union #=> /(?!)/
Regexp.union("penzance") #=> /penzance/
Regexp.union("a+b*c") #=> /a\+b\*c/
Regexp.union("skiing", "sledding") #=> /skiing|sledding/
Regexp.union(["skiing", "sledding"]) #=> /skiing|sledding/
Regexp.union(/dogs/, /cats/i) #=> /(?-mix:dogs)|(?i-mx:cats)/
And from the escape
docs:
Escapes any characters that would have special meaning in a regular expression. Returns a new escaped string, or self if no characters are escaped. For any string, Regexp.new(Regexp.escape(str))=~str will be true.
Regexp.escape('\*?{}.') #=> \\\*\?\{\}\.
This is a starting point:
patterns = [
/.+?\+.+?@/
]
strings = [
'do-not-reply', 'support', 'test', 'service', 'tips', 'twitter', 'alerts', 'survey',
'craigslist.org'
]
regex = Regexp.union(
*patterns,
*strings.map{ |s|
Regexp.new( Regexp.escape("#{ s }@"), Regexp::IGNORECASE ) }
)
pp regex
>> /(?-mix:.+?\+.+?@)|(?i-mx:do\-not\-reply@)|(?i-mx:support@)|(?i-mx:test@)|(?i-mx:service@)|(?i-mx:tips@)|(?i-mx:twitter@)|(?i-mx:alerts@)|(?i-mx:survey@)|(?i-mx
Applying the above:
sample_email_addresses = %w[
user
user+foo
do-not-reply
support
service
tips
twitter
alerts
survey
].map{ |e| e << '@host.com' }
pp sample_email_addresses.map{ |e| [e, !!e[regex]] }
>> [["[email protected]", false],
>> ["[email protected]", true],
>> ["[email protected]", true],
>> ["[email protected]", true],
>> ["[email protected]", true],
>> ["[email protected]", true],
>> ["[email protected]", true],
>> ["[email protected]", true],
>> ["[email protected]", true]]
The output shows a list containing each tested address. true
means they triggered a hit in the regex, meaning there was something wrong, and false
means they were clean and considered safe.
If you only want the ones that failed, i.e., matched the regex:
pp sample_email_addresses.select{ |e| e[regex] }
>> ["[email protected]",
>> "[email protected]",
>> "[email protected]",
>> "[email protected]",
>> "[email protected]",
>> "[email protected]",
>> "[email protected]",
>> "[email protected]"]
If you only want the ones that passed, i.e., didn't trigger a hit in the regex:
pp sample_email_addresses.reject{ |e| e[regex] }
>> ["[email protected]"]
Upvotes: 1
Reputation: 18784
As an illustration to Zabba's comment above:
USER_RULES = ['\+', 'do-not-reply', 'support', 'test', 'service', 'tips', 'twitter', 'alerts', 'survey']
DOMAIN_RULES = ['craigslist.org']
def email_is_junk(email)
return true if !email.match('@') # return early if no @
user, domain = email.split('@')
USER_RULES.each { |rule| return true if user.match(rule) }
DOMAIN_RULES.each { |rule| return true if domain.match(rule) }
false # reached the end without matching anything
end
Upvotes: 2
Reputation: 21996
Here's a Javascript version. Not sure it can be much simpler than:
function isJunk(email) {
return hasPlus(email) || supportLike(email) || craigsList(email);
}
function craigsList(email) {
return email.match(/@craigslist\.org/);
}
function supportLike(email) {
return email.match(/do-not-reply|support|test|service|tips|twitter|alerts|survey/);
}
function hasPlus(email) {
return email.match(/\+.*@/);
}
This is only a heuristic, so it's not 100% accurate. If you still have problems, consider the verification by sending the user an email with a token in it.
Upvotes: 0