jimmy
jimmy

Reputation: 514

Regex in PHP not working

My regex is:

$regex = '/(?<=Α: )(([\w-\.]+)@((?:[\w]+\.)+)([a-zA-Z]{2,4}))/';

My content among others is:

Q: Email Address 
A: [email protected]

Rad Software Regular Expression Designer says that it should work.

Various online sites return the correct results.

If I remove the (?<=Α: ) lookbehind the regex returns all emails correctly.

When I run it from php it returns no matches.

What's going on?

I've also used the specific type of regex (ie (?<=Email: ) with different content. It works just fine in that case.

Upvotes: 0

Views: 117

Answers (5)

Casimir et Hippolyte
Casimir et Hippolyte

Reputation: 89649

The A char in your subject is the "normal" char with the code 65 (unicode or ascii). But The A you use in the lookbehind of your pattern have the code 913 (unicode). They look similar but are different.

Upvotes: 0

AbsoluteƵER&#216;
AbsoluteƵER&#216;

Reputation: 7878

This is my newer monster script for verifying whether an e-mail "validates" or not. You can feed it strange things and break it, but in production this handles 99.99999999% of the problems I've encountered. A lot more false positives really from typos.

<?php

$pattern = '!^[^@\s]+@[^.@\s]+\.[^@\s]+$!';

$examples = array(
  '[email protected]',
  '[email protected]',
  '[email protected]',
  '[email protected]',
  'bad.email@google',
  '@google.com',
  'my@[email protected]',
  'my [email protected]',
);


foreach($examples as $test_mail){
    if(preg_match($pattern,$test_mail)){
      echo ("$test_mail - passes\n");   
    } else {
      echo ("$test_mail - fails\n");                
    }
}

?>

Output

  1. [email protected] - passes
  2. [email protected] - passes
  3. [email protected] - passes
  4. [email protected] - fails
  5. bad.email@google - fails
  6. @google.com - fails
  7. my@[email protected] - fails
  8. my [email protected] - fails

Unless there's a reason for the look-behind, you can match all of the emails in the string with preg_match_all(). Since you're working with a string, you would slightly modify the regex slightly:

$string_only_pattern = '!\s([^@\s]+@[^.@\s]+\.[^@\s]+)\s!s';

$mystring = '
[email protected] - passes
[email protected] - passes
[email protected] - passes
[email protected] - fails
bad.email@google - fails
@google.com - fails
my@[email protected] - fails
my [email protected] - fails
';

preg_match_all($string_only_pattern,$mystring,$matches);

print_r ($matches[1]);

Output from string only

Array
(
    [0] => [email protected]
    [1] => [email protected]
    [2] => [email protected]
    [3] => [email protected]
)

Upvotes: 1

Barmar
Barmar

Reputation: 782785

The problem is that your regular expression contains Α, which has an accent over it, but the content contains A, which doesn't. So the lookbehind doesn't match.

I change the regex to:

$regex = '/(?<=A: )(([\w-\.]+)@((?:[\w]+\.)+)([a-zA-Z]{2,4}))/';

and it works.

Upvotes: 0

anubhava
anubhava

Reputation: 786359

You are not most likely not using DOTALL flag s here which will make DOT match newlines as well in your regex:

$str = <<< EOF
Q: Email Address 
A: [email protected]
EOF;
if (preg_match_all('/(?<=A: )(([\w-\.]+)@((?:[\w]+\.)+)([a-zA-Z]{2,4}))/s', 
                   $str, $arr))
   print_r($arr);

OUTPUT:

Array
(
    [0] => Array
        (
            [0] => [email protected]
        )

    [1] => Array
        (
            [0] => [email protected]
        )

    [2] => Array
        (
            [0] => name
        )

    [3] => Array
        (
            [0] => example.
        )

    [4] => Array
        (
            [0] => com
        )

)

Upvotes: 1

user559633
user559633

Reputation:

Outside of your regex issue itself, you should really consider not trying to write your own e-mail address regex parser. See stackoverflow post: Using a regular expression to validate an email address on why -- upshot: the RFC is long and demanding on your regex abilities.

Upvotes: 0

Related Questions