user1145643
user1145643

Reputation: 898

Regular Expression For Multiple Possibilities

I have a piece of code that is extremely useful for substituting an email address with an actual link. Sometimes email addresses have more than one suffix (ie: .co.uk vs .com). I am able to create regex for each of these scenarios, however I'm curious if any regex gurus out there know of a way to combine the two into a single expression. If so, could you please explain what your answer is and why it works?

Here is my current code -

$input = "here is a line of text, [email protected], [email protected], [email protected] here";

preg_match_all('%\w+\@\w+\.\w+\.\w+%', $input, $matches);

$outmatch = Array();

if(is_array($matches[0])){
    foreach($matches[0] as $match){
        array_push($outmatch,$match);
    }
}

$outmatch = array_unique($outmatch);

if(is_array($outmatch)){
    foreach($outmatch as $outm){
        $input = str_replace($outm,'<a href="mailto:' . $outm . '">' . $outm . '</a>',$input);
    }
}

print $input;

Expression for 2 suffix: %\w+\@\w+\.\w+\.\w+%

Expression for 1 suffix: %\w+\@\w+\.\w+%

Upvotes: 1

Views: 1322

Answers (3)

hek2mgl
hek2mgl

Reputation: 157967

First, I not aim to develop the perfect matching regex for matching email addresses in this post. I just want to help the questioner a little bit. :)


The following regex matches at least one domain and its tld but it allows for multiple subdomains.

preg_match_all('%\w+\@\w+\.\w+(\.\w+)?%', $input, $matches);

So it matches:

[email protected]

[email protected]

[email protected]

... and so on. But it would not match:

test@test

.. because the tld is missing.


Further note, that a valid email user name can contain chars like the dot . So \w would not match all possible addresses. So a bettern pattern might look like this:

preg_match_all('%[a-zA-Z0-9._\%+-]+\@\w+\.\w+(\.\w+)?%', $input, $matches);

Further note :) That a valid domain name can also contain numbers and special chars for example the - . This results in an regex like this:

preg_match_all('%[a-zA-Z0-9._\%\+\-]+\@[a-zA-Z0-9\-]+\.\w+(\.\w+)?%', $input, $matches);

Further note :) :) A valid email address can also look like:

[email protected]

.. no domain names. Also note that any email address without a tld is valid. You see to create a really matching email regex ins't that easy.

I would advice you to take one well documented from the web that has been elaborated during times.

Upvotes: 2

Andrew Cheong
Andrew Cheong

Reputation: 30273

Use alternation ;)

preg_match_all('%\w+\@\w+\.\w+\.\w+|\w+\@\w+\.\w+%', $input, $matches);

Upvotes: 0

Mr. Llama
Mr. Llama

Reputation: 20889

This might work for you: %\w+\@(?:\w+\.)*\w+\.\w+%

It allows as many sub-domains or TLDs as necessary. Here's an example of it in action.

The (?:\w+\.)* means "zero or more occurrences of a sub-domain followed by a dot". The (?: makes it non-matching.

Upvotes: 0

Related Questions