morandi3
morandi3

Reputation: 1115

Regex split email address

I need some help with php regex, I want to "split" email address "[email protected]" to "johndoe" and "@example.com"

Until now I have this: preg_match('/<?([^<]+?)@/', '[email protected]', $matches); And I get Array ( [0] => johndoe@ [1] => johndoe)

So how I need to change regex?

Upvotes: 13

Views: 25984

Answers (7)

Chris Rudd
Chris Rudd

Reputation: 809

I've created a general regex for this that validates and creates named captures of the full email, the user, and the domain.

Regex:

(?<email>(?<mailbox>(?:\w|[!#$%&'*+/=?^`{|}~-])+(?:\.(?:\w|[!#$%&'*+/=?^`{|}~-])+)*)@(?<full_domain>(?<subdomains>(?:(?:[^\W\d_](?:(?:[^\W_]|-)+[^\W_])?)\.)*)(?<root_domain>[^\W\d_](?:(?:[^\W_]|-)+[^\W_])?)\.(?<tld>[^\W\d_](?:(?:[^\W_]|-)+[^\W_])?)))

Explanation:

(?<email>                          #  start Full Email capture
  (?<mailbox>                      #    Mailbox
    (?:\w|[!#$%&'*+/=?^`{|}~-])+   #      letter, number, underscore, or any of these special characters
    (?:                            #      Group: allow . in the middle of mailbox; can have multiple but can't be consecutive (no john..smith)
      \.                           #        match "." 
      (?:\w|[!#$%&'*+/=?^`{|}~-])+ #        letter, number, underscore, or any of these special characters
    )*                             #      allow one letter mailboxes
  )                                #    close Mailbox capture
  @                                #    match "@"
  (?<full_domain>                  #    Full Domain (including subdomains and tld)
    (?<subdomains>                 #      All Subdomains
      (?:                          #        label + '.' (so we can allow 0 or more)
        (?:                        #          label text
          [^\W\d_]                 #            start with a letter (\W is the inverse of \w so we end up with \w minus numbers and _)
          (?:                      #            paired with a ? to allow single letter domains
            (?:[^\W_]|-)+          #              allow letters, numbers, hyphens, but not underscore
            [^\W_]                 #              if domain is more than one character, it has to end with a letter or digit (not a hyphen or underscore)
          )?                       #            allow one letter sub domains
        )                          #          end label text
      \.)*                         #        allow 0 or more subdomains separated by '.'
    )                              #      close All Subdomains capture
    (?<root_domain>                #      Root Domain
      [^\W\d_]                     #        start with a letter
      (?:                          #        paired with ? to make characters after the first optional
        (?:[^\W_]|-)+              #          allow letters, numbers, hyphens
        [^\W_]                     #          if domain is more than one character, it has to end with a letter or digit (not a hyphen or underscore)
      )?                           #        allow one letter domains
    )                              #      close Root Domain capture
    \.                             #      separator
    (?<tld>                        #      TLD
      [^\W\d_]                     #        start with a letter
      (?:                          #        paired with ? to make characters after the first optional
        (?:[^\W_]|-)+              #          allow letters, numbers, hyphens
        [^\W_]                     #          if domain is more than one character, it has to end with a letter or digit (not a hyphen)
      )?                           #        allow single letter tld
    )                              #      close TLD capture
  )                                #    close Full Domain capture
)                                  #  close Full Email capture

Notes

Generalized Regex: I've posted JUST the regex search itself not the php exclusive stuff. This is to make it easier to use for other people who find it based on the name "Regex Split Email Address".

Feature Compatibility: Not all regex processors support Named Captures, if you have trouble with it test it with your text on Regexr (checking the Details to see the captures). If it works there then double check if the regex engine you're using supports named captures.

Domain RFC: The domain part is also based on the domain RFC not just 2822

Dangerous Characters: I have explicitly included '$! etc to both make it clear these are allowed by the mail RFC and to make it easy to remove if a particular set of characters should be disallowed in your system due to special processing requirements (like blocking of possible sql injection attacks)

No Escape: for the mailbox name I've only included dot-atom format, I've intentionally excluded dot or slash escaped support

Subtle Letters: For some parts I've used [^\W\d_] instead of [a-zA-Z] to improve support for languages other than english.

Out of Bounds: Due to idiosyncrasies in capture group processing in some systems I've used + in place of {,61}. If you're using it someplace that might be vulnerable to buffer overflow attacks remember to bound your inputs

Credits: Modified from community post by Tripleaxis, which was in turn taken from the .net helpfiles

Upvotes: -1

Brogan
Brogan

Reputation: 748

Some of the previous answers are wrong, as a valid email address can, in fact, include more than a single @ symbol by containing it within dot delimited, quoted text. See the following example:

$email = 'a."b@c"[email protected]';
echo (filter_var($email, FILTER_VALIDATE_EMAIL) ? 'V' : 'Inv'), 'alid email format.';

Valid email format.


Multiple delimited blocks of text and a multitude of @ symbols can exist. Both of these examples are valid email addresses:

$email = 'a."b@c".d."@"[email protected]';
$email = '/."@@@@@@"./@a.b';

Based on Michael Berkowski's explode answer, this email address would look like this:

$email = 'a."b@c"[email protected]';
$parts = explode('@', $email);
$user = $parts[0];
$domain = '@' . $parts[1];

User: a."b"
Domain: @c".d


Anyone using this solution should beware of potential abuse. Accepting an email address based on these outputs, followed by inserting $email into a database could have negative implications.

$email = 'a."b@c".d@INSERT BAD STUFF HERE';

The contents of these functions are only accurate so long as filter_var is used for validation first.

From the left:

Here is a simple non-regex, non-exploding solution for finding the first @ that is not contained within delimited and quoted text. Nested delimited text is considered invalid based on filter_var, so finding the proper @ is a very simple search.

if(filter_var($email, FILTER_VALIDATE_EMAIL)) {
    $a = '"';
    $b = '.';
    $c = '@';
    $d = strlen($email);
    $contained = false;
    for($i = 0; $i < $d; ++$i) {
        if($contained) {
            if($email[$i] === $a && $email[$i + 1] === $b) {
                $contained = false;
                ++$i;
            }
        }
        elseif($email[$i] === $c)
            break;
        elseif($email[$i] === $b && $email[$i + 1] === $a) {
            $contained = true;
            ++$i;
        }
    }
    $local = substr($email, 0, $i);
    $domain = substr($email, $i);
}

Here is the same code tucked inside a function.

function parse_email($email) {
    if(!filter_var($email, FILTER_VALIDATE_EMAIL)) return false;
    $a = '"';
    $b = '.';
    $c = '@';
    $d = strlen($email);
    $contained = false;
    for($i = 0; $i < $d; ++$i) {
        if($contained) {
            if($email[$i] === $a && $email[$i + 1] === $b) {
                $contained = false;
                ++$i;
            }
        }
        elseif($email[$i] === $c)
            break;
        elseif($email[$i] === $b && $email[$i + 1] === $a) {
            $contained = true;
            ++$i;
        }
    }
    return array('local' => substr($email, 0, $i), 'domain' => substr($email, $i));
}

In use:

$email = 'a."b@c".x."@"[email protected]';
$email = parse_email($email);
if($email !== false)
    print_r($email);
else
    echo 'Bad email address.';

Array ( [local] => a."b@c".x."@".d.e [domain] => @f.g )

$email = 'a."b@c".x."@"[email protected]@';
$email = parse_email($email);
if($email !== false)
    print_r($email);
else
    echo 'Bad email address.';

Bad email address.


From the right:

After doing some testing of filter_var and researching what is acceptable as a valid domain name (Hostnames separated by dots), I created this function to get a better performance. In a valid email address, the last @ should be the true @, as the @ symbol should never appear in the domain of a valid email address.

if(filter_var($email, FILTER_VALIDATE_EMAIL)) {
    $domain = strrpos($email, '@');
    $local = substr($email, 0, $domain);
    $domain = substr($email, $domain);
}

As a function:

function parse_email($email) {
    if(!filter_var($email, FILTER_VALIDATE_EMAIL)) return false;
    $a = strrpos($email, '@');
    return array('local' => substr($email, 0, $a), 'domain' => substr($email, $a));
}

Or using explode and implode:

if(filter_var($email, FILTER_VALIDATE_EMAIL)) {
    $local = explode('@', $email);
    $domain = '@' . array_pop($local);
    $local = implode('@', $local);
}

As a function:

function parse_email($email) {
    if(!filter_var($email, FILTER_VALIDATE_EMAIL)) return false;
    $email = explode('@', $email);
    $domain = '@' . array_pop($email);
    return array('local' => implode('@', $email), 'domain' => $domain);
}

If you would still like to use regex, splitting the string starting from the end of a valid email address is the safest option.

/(.*)(@.*)$/

(.*) Matches anything.
(@.*) Matches anything that begins with an @ symbol.
$ End of the string.

if(filter_var($email, FILTER_VALIDATE_EMAIL)) {
    $local = preg_split('/(.*)(@.*)$/', $email, -1, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE);
    $domain = $local[1];
    $local = $local[0];
}

As a function:

function parse_email($email) {
    if(!filter_var($email, FILTER_VALIDATE_EMAIL)) return false;
    $email = preg_split('/(.*)(@.*)$/', $email, -1, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE);
    return array('local' => $email[0], 'domain' => $email[1]);
}

Or

if(filter_var($email, FILTER_VALIDATE_EMAIL)) {
    preg_match('/(.*)(@.*)$/', $email, $matches);
    $local = $matches[1];
    $domain = $matches[2];
}

As a function:

function parse_email($email) {
    if(!filter_var($email, FILTER_VALIDATE_EMAIL)) return false;
    preg_match('/(.*)(@.*)$/', $email, $matches);
    return array('local' => $matches[1], 'domain' => $matches[2]);
}

Upvotes: 9

umutkeskin
umutkeskin

Reputation: 149

Use regular expression. For example:

$mailadress = "[email protected]";     
$exp_arr= preg_match_all("/(.*)@(.*)\.(.*)/",$mailadress,$newarr, PREG_SET_ORDER); 

/*
Array output:
Array
(
    [0] => Array
        (
            [0] => [email protected]
            [1] => email
            [2] => company
            [3] => com
        )

)
*/

Upvotes: 0

xDaizu
xDaizu

Reputation: 1061

Answer

$parts = explode("@", $email);
$domain = array_pop($parts);
$name = implode("@",$parts);

This solves both Brogan's edge cases (a."b@c".d."@"[email protected]and /."@@@@@@"./@a.b) as you can see in this Ideone


The currently accepted answer is not valid because of the multiple "@" case.

I loved @Brogan's answer until I read his last sentence:

In a valid email address, the last @ should be the true @, as the @ symbol should never appear in the domain of a valid email address.

That is supported by this other answer. And if that's true, his answer seems unnecessarily complex.

Upvotes: 2

m4rinos
m4rinos

Reputation: 458

If you want a preg_match solution, you could also do something like this

preg_match('/([^<]+)(@[^<]+)/','[email protected]',$matches);

Upvotes: 0

middric
middric

Reputation: 376

Using explode is probably the best approach here, but to do it with regex you would do something like this:

/^([^@]*)(@.*)/

^ start of string

([^@]*) anything that is not an @ symbol ($matches[0])

(@.*) @ symbol followed by anything ($matches[1])

Upvotes: 3

Michael Berkowski
Michael Berkowski

Reputation: 270607

$parts = explode('@', "[email protected]");

$user = $parts[0];
// Stick the @ back onto the domain since it was chopped off.
$domain = "@" . $parts[1];

Upvotes: 30

Related Questions