Reputation: 514
My regex is:
$regex = '/(?<=Α: )(([\w-\.]+)@((?:[\w]+\.)+)([a-zA-Z]{2,4}))/';
My content among others is:
Q: Email Address
A: [email protected]
Rad Software Regular Expression Designer says that it should work.
Various online sites return the correct results.
If I remove the (?<=Α: ) lookbehind the regex returns all emails correctly.
When I run it from php it returns no matches.
What's going on?
I've also used the specific type of regex (ie (?<=Email: ) with different content. It works just fine in that case.
Upvotes: 0
Views: 117
Reputation: 89649
The A char in your subject is the "normal" char with the code 65 (unicode or ascii). But The A you use in the lookbehind of your pattern have the code 913 (unicode). They look similar but are different.
Upvotes: 0
Reputation: 7878
This is my newer monster script for verifying whether an e-mail "validates" or not. You can feed it strange things and break it, but in production this handles 99.99999999% of the problems I've encountered. A lot more false positives really from typos.
<?php
$pattern = '!^[^@\s]+@[^.@\s]+\.[^@\s]+$!';
$examples = array(
'[email protected]',
'[email protected]',
'[email protected]',
'[email protected]',
'bad.email@google',
'@google.com',
'my@[email protected]',
'my [email protected]',
);
foreach($examples as $test_mail){
if(preg_match($pattern,$test_mail)){
echo ("$test_mail - passes\n");
} else {
echo ("$test_mail - fails\n");
}
}
?>
Output
Unless there's a reason for the look-behind, you can match all of the emails in the string with preg_match_all(). Since you're working with a string, you would slightly modify the regex slightly:
$string_only_pattern = '!\s([^@\s]+@[^.@\s]+\.[^@\s]+)\s!s';
$mystring = '
[email protected] - passes
[email protected] - passes
[email protected] - passes
[email protected] - fails
bad.email@google - fails
@google.com - fails
my@[email protected] - fails
my [email protected] - fails
';
preg_match_all($string_only_pattern,$mystring,$matches);
print_r ($matches[1]);
Output from string only
Array
(
[0] => [email protected]
[1] => [email protected]
[2] => [email protected]
[3] => [email protected]
)
Upvotes: 1
Reputation: 782785
The problem is that your regular expression contains Α
, which has an accent over it, but the content contains A
, which doesn't. So the lookbehind doesn't match.
I change the regex to:
$regex = '/(?<=A: )(([\w-\.]+)@((?:[\w]+\.)+)([a-zA-Z]{2,4}))/';
and it works.
Upvotes: 0
Reputation: 786359
You are not most likely not using DOTALL flag s
here which will make DOT match newlines as well in your regex:
$str = <<< EOF
Q: Email Address
A: [email protected]
EOF;
if (preg_match_all('/(?<=A: )(([\w-\.]+)@((?:[\w]+\.)+)([a-zA-Z]{2,4}))/s',
$str, $arr))
print_r($arr);
OUTPUT:
Array
(
[0] => Array
(
[0] => [email protected]
)
[1] => Array
(
[0] => [email protected]
)
[2] => Array
(
[0] => name
)
[3] => Array
(
[0] => example.
)
[4] => Array
(
[0] => com
)
)
Upvotes: 1
Reputation:
Outside of your regex issue itself, you should really consider not trying to write your own e-mail address regex parser. See stackoverflow post: Using a regular expression to validate an email address on why -- upshot: the RFC is long and demanding on your regex abilities.
Upvotes: 0