Reputation: 1984
I'm trying to create a regular expressions that will filter valid emails using PHP and have ran into an issue that conflicts with what I understand of regular expressions. Here is the code that I am using.
if (!preg_match('/^[-a-zA-Z0-9_.]+@[-a-zA-Z0-9]+.[a-zA-Z]{2,4}$/', $string)) {
return $false;
}
Now from the materials that I've researched, this should allow content before the @ to be multiple letters, numbers, underscores and periods, then afterwards to allow multiple letters and numbers, then require a period, then two to four letters for the top level domain.
However, right now it ignores the requirement for having the top level domain section. For example [email protected] obviously is valid (and should be), but a@b is also returning as valid, which I want ti to be flagged as not so.
I'm sure I"m missing something, but after browsing google for an hour I'm at a loss as to what it could be. Anyone have an answer for this conundrum?
EDIT: The speed that answers arrive here makes this site superior over it's competitors. Well done!
Upvotes: 1
Views: 3231
Reputation: 41040
From the page Comparing E-mail Address Validating Regular Expressions: Geert De Deckere from the Kohana project has developed a near perfect one:
/^[-_a-z0-9\'+*$^&%=~!?{}]++(?:\.[-_a-z0-9\'+*$^&%=~!?{}]+)*+@(?:(?![-.])[-a-z0-9.]+(?<![-.])\.[a-z]{2,6}|\d{1,3}(?:\.\d{1,3}){3})(?::\d++)?$/iD
But there is also a buildin function in PHP filter_var($email, FILTER_VALIDATE_EMAIL)
but it seems to be under development. And there is an other serious solution: PEAR:Validate. I think the PEAR Solution is the best one.
Upvotes: 3
Reputation: 49534
This is the most reasonable trade off of the spec versus real life that I have seen:
[a-z0-9!#$%&'*+/=?^_`{|}~-]+
(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*
@
(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+
(?:[A-Z]{2}|com|org|net|gov|mil|biz|info|mobi|name|aero|jobs|museum)\b
Of course, you have to remove the line breaks, and you have to update it if more top-level domains become available.
Upvotes: 1
Reputation: 28713
You should escape .
when it's not a part of the group: '/^[-a-zA-Z0-9_.]+@[-a-zA-Z0-9]+\.[a-zA-Z]{2,4}$/'
Otherwise it will be equal to any letter:
.
- any symbol (but not the newline \n
if not using s
modifier)\.
- dot symbol[.]
- dot symbol (inside symbol group)Upvotes: 6
Reputation: 47829
A single dot in a regular expression means "match any character". And that's exactly what is does when a top level domain is missing (also when it's present, of course).
Thus you should change your code like that:
if (!preg_match('/^[-a-zA-Z0-9_.]+@[-a-zA-Z0-9]+\.[a-zA-Z]{2,4}$/', $string)) {
return $false;
}
And by the way: a lot more characters are allowed in the local part than what your regular expression currently allows for.
Upvotes: 0
Reputation: 116169
Rather than rolling your own, perhaps you should read the article How to Find or Validate an Email Address on Regular-Expressions.info. The article also discusses reasons why you might not want to validate an email address using a regular expression and provides 3 regular expressions that you might consider using instead of your own.
Upvotes: 5