Helmut
Helmut

Reputation: 1377

regex for email validation: where is the error?

This sounds strange, but I've been using this function for quite a while now and "suddenly, from one day to the other" it does not filter some addresses in the right way anymore. However, I cannot see why...

function validate_email($email)
{
/*
(Name) Letters, Numbers, Dots, Hyphens and Underscores
(@ sign)
(Domain) (with possible subdomain(s) ).
Contains only letters, numbers, dots and hyphens (up to 255 characters)
(. sign)
(Extension) Letters only (up to 10 (can be increased in the future) characters)
*/

$regex = '/([a-z0-9_.-]+)'. # name

'@'. # at

'([a-z0-9.-]+){2,255}'. # domain & possibly subdomains

'.'. # period

'([a-z]+){2,10}/i'; # domain extension 

if($email == '') { 
    return false;
}
else {
$eregi = preg_replace($regex, '', $email);
}

return empty($eregi) ? true : false;
}

e.g. "some@gmail" will be shown as correct, etc so it seems sth happened with the tld - does anybody could tell me why?

Thank you very much in advance!

Upvotes: 0

Views: 288

Answers (4)

Laoujin
Laoujin

Reputation: 10229

. means any character. You should escape it if you actually mean 'dot': \.

Your regex also has some other problems:

  • No uppercases are allowed in your regex: [a-zA-Z0-9]
  • No unicode characters are allowed in your regex (for example email addresses with é, ç, ... etc)
  • Some special characters such as + are in fact allowed in an email address
  • ...

I would keep the email validation very simple. Like check if there is a @ present and pretty much keep it at that. For if you really want to validate an email, the regex becomes gruesome.

Check this SO answer for a more detailed explanation.

Upvotes: 2

Jacta
Jacta

Reputation: 507

What about FILTER_VALIDATE_EMAIL

Upvotes: 0

Philipp
Philipp

Reputation: 69663

I think the error is in this line:

'.'. # period 

You mean a literal period here. But periods have a special meaning in regular expressions (they mean "any character").

You need to escape it with a backslash.

Upvotes: 1

Joey
Joey

Reputation: 354566

What you commented with "period":

'.'. # period

is in fact a placeholder for any character. It should be \. instead.

However, you're overcomplicating things. Such validation should exist to reject either empty fields or obviously wrong stuff (e.g. name put in the email field). So in my experience the best check is just to look whether it contains an @ and don't worry too much about getting the structure right. You can in fact write a regex which will faithfully validate any valid email address and reject any invalid one. It's a monster spanning about a screen of text. Don't do that. KISS.

Upvotes: 1

Related Questions