canadiancreed
canadiancreed

Reputation: 1984

Regular expression fun with emails; top level domain not required when it should be

I'm trying to create a regular expressions that will filter valid emails using PHP and have ran into an issue that conflicts with what I understand of regular expressions. Here is the code that I am using.

if (!preg_match('/^[-a-zA-Z0-9_.]+@[-a-zA-Z0-9]+.[a-zA-Z]{2,4}$/', $string)) {
return $false;
}

Now from the materials that I've researched, this should allow content before the @ to be multiple letters, numbers, underscores and periods, then afterwards to allow multiple letters and numbers, then require a period, then two to four letters for the top level domain.

However, right now it ignores the requirement for having the top level domain section. For example [email protected] obviously is valid (and should be), but a@b is also returning as valid, which I want ti to be flagged as not so.

I'm sure I"m missing something, but after browsing google for an hour I'm at a loss as to what it could be. Anyone have an answer for this conundrum?

EDIT: The speed that answers arrive here makes this site superior over it's competitors. Well done!

Upvotes: 1

Views: 3231

Answers (6)

powtac
powtac

Reputation: 41040

From the page Comparing E-mail Address Validating Regular Expressions: Geert De Deckere from the Kohana project has developed a near perfect one:

/^[-_a-z0-9\'+*$^&%=~!?{}]++(?:\.[-_a-z0-9\'+*$^&%=~!?{}]+)*+@(?:(?![-.])[-a-z0-9.]+(?<![-.])\.[a-z]{2,6}|\d{1,3}(?:\.\d{1,3}){3})(?::\d++)?$/iD

But there is also a buildin function in PHP filter_var($email, FILTER_VALIDATE_EMAIL) but it seems to be under development. And there is an other serious solution: PEAR:Validate. I think the PEAR Solution is the best one.

Upvotes: 3

John Gietzen
John Gietzen

Reputation: 49534

This is the most reasonable trade off of the spec versus real life that I have seen:

[a-z0-9!#$%&'*+/=?^_`{|}~-]+
(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*
@
(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+
(?:[A-Z]{2}|com|org|net|gov|mil|biz|info|mobi|name|aero|jobs|museum)\b

Of course, you have to remove the line breaks, and you have to update it if more top-level domains become available.

Upvotes: 1

Ivan Nevostruev
Ivan Nevostruev

Reputation: 28713

You should escape . when it's not a part of the group: '/^[-a-zA-Z0-9_.]+@[-a-zA-Z0-9]+\.[a-zA-Z]{2,4}$/' Otherwise it will be equal to any letter:

  • . - any symbol (but not the newline \n if not using s modifier)
  • \. - dot symbol
  • [.] - dot symbol (inside symbol group)

Upvotes: 6

innaM
innaM

Reputation: 47829

A single dot in a regular expression means "match any character". And that's exactly what is does when a top level domain is missing (also when it's present, of course).

Thus you should change your code like that:

if (!preg_match('/^[-a-zA-Z0-9_.]+@[-a-zA-Z0-9]+\.[a-zA-Z]{2,4}$/', $string)) {
    return $false;
}

And by the way: a lot more characters are allowed in the local part than what your regular expression currently allows for.

Upvotes: 0

Thomas Owens
Thomas Owens

Reputation: 116169

Rather than rolling your own, perhaps you should read the article How to Find or Validate an Email Address on Regular-Expressions.info. The article also discusses reasons why you might not want to validate an email address using a regular expression and provides 3 regular expressions that you might consider using instead of your own.

Upvotes: 5

Related Questions