Reputation: 898
I have a piece of code that is extremely useful for substituting an email address with an actual link. Sometimes email addresses have more than one suffix (ie: .co.uk vs .com). I am able to create regex for each of these scenarios, however I'm curious if any regex gurus out there know of a way to combine the two into a single expression. If so, could you please explain what your answer is and why it works?
Here is my current code -
$input = "here is a line of text, [email protected], [email protected], [email protected] here";
preg_match_all('%\w+\@\w+\.\w+\.\w+%', $input, $matches);
$outmatch = Array();
if(is_array($matches[0])){
foreach($matches[0] as $match){
array_push($outmatch,$match);
}
}
$outmatch = array_unique($outmatch);
if(is_array($outmatch)){
foreach($outmatch as $outm){
$input = str_replace($outm,'<a href="mailto:' . $outm . '">' . $outm . '</a>',$input);
}
}
print $input;
Expression for 2 suffix: %\w+\@\w+\.\w+\.\w+%
Expression for 1 suffix: %\w+\@\w+\.\w+%
Upvotes: 1
Views: 1322
Reputation: 157967
First, I not aim to develop the perfect matching regex for matching email addresses in this post. I just want to help the questioner a little bit. :)
The following regex matches at least one domain and its tld but it allows for multiple subdomains.
preg_match_all('%\w+\@\w+\.\w+(\.\w+)?%', $input, $matches);
So it matches:
[email protected]
[email protected]
[email protected]
... and so on. But it would not match:
test@test
.. because the tld is missing.
Further note, that a valid email user name can contain chars like the dot .
So \w
would not match all possible addresses. So a bettern pattern might look like this:
preg_match_all('%[a-zA-Z0-9._\%+-]+\@\w+\.\w+(\.\w+)?%', $input, $matches);
Further note :) That a valid domain name can also contain numbers and special chars for example the -
. This results in an regex like this:
preg_match_all('%[a-zA-Z0-9._\%\+\-]+\@[a-zA-Z0-9\-]+\.\w+(\.\w+)?%', $input, $matches);
Further note :) :) A valid email address can also look like:
[email protected]
.. no domain names. Also note that any email address without a tld is valid. You see to create a really matching email regex ins't that easy.
I would advice you to take one well documented from the web that has been elaborated during times.
Upvotes: 2
Reputation: 30273
Use alternation ;)
preg_match_all('%\w+\@\w+\.\w+\.\w+|\w+\@\w+\.\w+%', $input, $matches);
Upvotes: 0
Reputation: 20889
This might work for you: %\w+\@(?:\w+\.)*\w+\.\w+%
It allows as many sub-domains or TLDs as necessary. Here's an example of it in action.
The (?:\w+\.)*
means "zero or more occurrences of a sub-domain followed by a dot". The (?:
makes it non-matching.
Upvotes: 0