Reputation: 1115
I need some help with php regex, I want to "split" email address "[email protected]" to "johndoe" and "@example.com"
Until now I have this: preg_match('/<?([^<]+?)@/', '[email protected]', $matches);
And I get Array ( [0] => johndoe@ [1] => johndoe)
So how I need to change regex?
Upvotes: 13
Views: 25984
Reputation: 809
I've created a general regex for this that validates and creates named captures of the full email, the user, and the domain.
Regex:
(?<email>(?<mailbox>(?:\w|[!#$%&'*+/=?^`{|}~-])+(?:\.(?:\w|[!#$%&'*+/=?^`{|}~-])+)*)@(?<full_domain>(?<subdomains>(?:(?:[^\W\d_](?:(?:[^\W_]|-)+[^\W_])?)\.)*)(?<root_domain>[^\W\d_](?:(?:[^\W_]|-)+[^\W_])?)\.(?<tld>[^\W\d_](?:(?:[^\W_]|-)+[^\W_])?)))
Explanation:
(?<email> # start Full Email capture
(?<mailbox> # Mailbox
(?:\w|[!#$%&'*+/=?^`{|}~-])+ # letter, number, underscore, or any of these special characters
(?: # Group: allow . in the middle of mailbox; can have multiple but can't be consecutive (no john..smith)
\. # match "."
(?:\w|[!#$%&'*+/=?^`{|}~-])+ # letter, number, underscore, or any of these special characters
)* # allow one letter mailboxes
) # close Mailbox capture
@ # match "@"
(?<full_domain> # Full Domain (including subdomains and tld)
(?<subdomains> # All Subdomains
(?: # label + '.' (so we can allow 0 or more)
(?: # label text
[^\W\d_] # start with a letter (\W is the inverse of \w so we end up with \w minus numbers and _)
(?: # paired with a ? to allow single letter domains
(?:[^\W_]|-)+ # allow letters, numbers, hyphens, but not underscore
[^\W_] # if domain is more than one character, it has to end with a letter or digit (not a hyphen or underscore)
)? # allow one letter sub domains
) # end label text
\.)* # allow 0 or more subdomains separated by '.'
) # close All Subdomains capture
(?<root_domain> # Root Domain
[^\W\d_] # start with a letter
(?: # paired with ? to make characters after the first optional
(?:[^\W_]|-)+ # allow letters, numbers, hyphens
[^\W_] # if domain is more than one character, it has to end with a letter or digit (not a hyphen or underscore)
)? # allow one letter domains
) # close Root Domain capture
\. # separator
(?<tld> # TLD
[^\W\d_] # start with a letter
(?: # paired with ? to make characters after the first optional
(?:[^\W_]|-)+ # allow letters, numbers, hyphens
[^\W_] # if domain is more than one character, it has to end with a letter or digit (not a hyphen)
)? # allow single letter tld
) # close TLD capture
) # close Full Domain capture
) # close Full Email capture
Generalized Regex: I've posted JUST the regex search itself not the php exclusive stuff. This is to make it easier to use for other people who find it based on the name "Regex Split Email Address".
Feature Compatibility: Not all regex processors support Named Captures, if you have trouble with it test it with your text on Regexr (checking the Details to see the captures). If it works there then double check if the regex engine you're using supports named captures.
Domain RFC: The domain part is also based on the domain RFC not just 2822
Dangerous Characters: I have explicitly included '$!
etc to both make it clear these are allowed by the mail RFC and to make it easy to remove if a particular set of characters should be disallowed in your system due to special processing requirements (like blocking of possible sql injection attacks)
No Escape: for the mailbox name I've only included dot-atom format, I've intentionally excluded dot or slash escaped support
Subtle Letters: For some parts I've used [^\W\d_] instead of [a-zA-Z] to improve support for languages other than english.
Out of Bounds: Due to idiosyncrasies in capture group processing in some systems I've used +
in place of {,61}
. If you're using it someplace that might be vulnerable to buffer overflow attacks remember to bound your inputs
Credits: Modified from community post by Tripleaxis, which was in turn taken from the .net helpfiles
Upvotes: -1
Reputation: 748
Some of the previous answers are wrong, as a valid email address can, in fact, include more than a single @ symbol by containing it within dot delimited, quoted text. See the following example:
$email = 'a."b@c"[email protected]';
echo (filter_var($email, FILTER_VALIDATE_EMAIL) ? 'V' : 'Inv'), 'alid email format.';
Valid email format.
Multiple delimited blocks of text and a multitude of @ symbols can exist. Both of these examples are valid email addresses:
$email = 'a."b@c".d."@"[email protected]';
$email = '/."@@@@@@"./@a.b';
Based on Michael Berkowski's explode answer, this email address would look like this:
$email = 'a."b@c"[email protected]';
$parts = explode('@', $email);
$user = $parts[0];
$domain = '@' . $parts[1];
User: a."b"
Domain: @c".d
Anyone using this solution should beware of potential abuse. Accepting an email address based on these outputs, followed by inserting $email into a database could have negative implications.
$email = 'a."b@c".d@INSERT BAD STUFF HERE';
The contents of these functions are only accurate so long as filter_var is used for validation first.
Here is a simple non-regex, non-exploding solution for finding the first @ that is not contained within delimited and quoted text. Nested delimited text is considered invalid based on filter_var, so finding the proper @ is a very simple search.
if(filter_var($email, FILTER_VALIDATE_EMAIL)) {
$a = '"';
$b = '.';
$c = '@';
$d = strlen($email);
$contained = false;
for($i = 0; $i < $d; ++$i) {
if($contained) {
if($email[$i] === $a && $email[$i + 1] === $b) {
$contained = false;
++$i;
}
}
elseif($email[$i] === $c)
break;
elseif($email[$i] === $b && $email[$i + 1] === $a) {
$contained = true;
++$i;
}
}
$local = substr($email, 0, $i);
$domain = substr($email, $i);
}
Here is the same code tucked inside a function.
function parse_email($email) {
if(!filter_var($email, FILTER_VALIDATE_EMAIL)) return false;
$a = '"';
$b = '.';
$c = '@';
$d = strlen($email);
$contained = false;
for($i = 0; $i < $d; ++$i) {
if($contained) {
if($email[$i] === $a && $email[$i + 1] === $b) {
$contained = false;
++$i;
}
}
elseif($email[$i] === $c)
break;
elseif($email[$i] === $b && $email[$i + 1] === $a) {
$contained = true;
++$i;
}
}
return array('local' => substr($email, 0, $i), 'domain' => substr($email, $i));
}
In use:
$email = 'a."b@c".x."@"[email protected]';
$email = parse_email($email);
if($email !== false)
print_r($email);
else
echo 'Bad email address.';
Array ( [local] => a."b@c".x."@".d.e [domain] => @f.g )
$email = 'a."b@c".x."@"[email protected]@';
$email = parse_email($email);
if($email !== false)
print_r($email);
else
echo 'Bad email address.';
Bad email address.
After doing some testing of filter_var and researching what is acceptable as a valid domain name (Hostnames separated by dots), I created this function to get a better performance. In a valid email address, the last @ should be the true @, as the @ symbol should never appear in the domain of a valid email address.
if(filter_var($email, FILTER_VALIDATE_EMAIL)) {
$domain = strrpos($email, '@');
$local = substr($email, 0, $domain);
$domain = substr($email, $domain);
}
As a function:
function parse_email($email) {
if(!filter_var($email, FILTER_VALIDATE_EMAIL)) return false;
$a = strrpos($email, '@');
return array('local' => substr($email, 0, $a), 'domain' => substr($email, $a));
}
Or using explode and implode:
if(filter_var($email, FILTER_VALIDATE_EMAIL)) {
$local = explode('@', $email);
$domain = '@' . array_pop($local);
$local = implode('@', $local);
}
As a function:
function parse_email($email) {
if(!filter_var($email, FILTER_VALIDATE_EMAIL)) return false;
$email = explode('@', $email);
$domain = '@' . array_pop($email);
return array('local' => implode('@', $email), 'domain' => $domain);
}
If you would still like to use regex, splitting the string starting from the end of a valid email address is the safest option.
/(.*)(@.*)$/
(.*) Matches anything.
(@.*) Matches anything that begins with an @ symbol.
$ End of the string.
if(filter_var($email, FILTER_VALIDATE_EMAIL)) {
$local = preg_split('/(.*)(@.*)$/', $email, -1, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE);
$domain = $local[1];
$local = $local[0];
}
As a function:
function parse_email($email) {
if(!filter_var($email, FILTER_VALIDATE_EMAIL)) return false;
$email = preg_split('/(.*)(@.*)$/', $email, -1, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE);
return array('local' => $email[0], 'domain' => $email[1]);
}
Or
if(filter_var($email, FILTER_VALIDATE_EMAIL)) {
preg_match('/(.*)(@.*)$/', $email, $matches);
$local = $matches[1];
$domain = $matches[2];
}
As a function:
function parse_email($email) {
if(!filter_var($email, FILTER_VALIDATE_EMAIL)) return false;
preg_match('/(.*)(@.*)$/', $email, $matches);
return array('local' => $matches[1], 'domain' => $matches[2]);
}
Upvotes: 9
Reputation: 149
Use regular expression. For example:
$mailadress = "[email protected]";
$exp_arr= preg_match_all("/(.*)@(.*)\.(.*)/",$mailadress,$newarr, PREG_SET_ORDER);
/*
Array output:
Array
(
[0] => Array
(
[0] => [email protected]
[1] => email
[2] => company
[3] => com
)
)
*/
Upvotes: 0
Reputation: 1061
$parts = explode("@", $email);
$domain = array_pop($parts);
$name = implode("@",$parts);
This solves both Brogan's edge cases (a."b@c".d."@"[email protected]
and /."@@@@@@"./@a.b
) as you can see in this Ideone
The currently accepted answer is not valid because of the multiple "@" case.
I loved @Brogan's answer until I read his last sentence:
In a valid email address, the last @ should be the true @, as the @ symbol should never appear in the domain of a valid email address.
That is supported by this other answer. And if that's true, his answer seems unnecessarily complex.
Upvotes: 2
Reputation: 458
If you want a preg_match solution, you could also do something like this
preg_match('/([^<]+)(@[^<]+)/','[email protected]',$matches);
Upvotes: 0
Reputation: 376
Using explode is probably the best approach here, but to do it with regex you would do something like this:
/^([^@]*)(@.*)/
^ start of string
([^@]*) anything that is not an @ symbol ($matches[0])
(@.*) @ symbol followed by anything ($matches[1])
Upvotes: 3
Reputation: 270607
$parts = explode('@', "[email protected]");
$user = $parts[0];
// Stick the @ back onto the domain since it was chopped off.
$domain = "@" . $parts[1];
Upvotes: 30