denied
denied

Reputation: 143

Regular expression for links

I have a string which has urls and I need to replace that urls with links but only if the links are in a white list of domains. I have a pattern which replaces the urls with links but I don't know how to put that list of accepted domains in the pattern. I use the following code:

$pattern = '/\b((http(s?):\/\/)|(?=www\.))(\S+)/is';

preg_replace($pattern,
         '<a href="$1$4" target="_blank">$1$4</a>',
         $string);

Upvotes: 0

Views: 150

Answers (1)

Quixrick
Quixrick

Reputation: 3200

Before you do your REGEX stuff, you can just check to see if the domain appears in the whitelist.

<?php

$whitelist = array('http://www.google.com', 'http://www.yahoo.com');

$string = 'http://www.google.com';

if (in_array($string, $whitelist)) {

    $pattern = '/\b((http(s?):\/\/)|(?=www\.))(\S+)/is';

    $string = preg_replace($pattern, '<a href="$1$4" target="_blank">$1$4</a>', $string);

}

print $string;

EDIT:

So for this, I turned the string into an array and then looped through each part of that array. Then I checked to see if that array part matched any of the whitelist words. If so, then I plopped in your REGEX stuff; If not, it got left alone. Then I added each part back to an array, which I turned back into a string. I also applied CodeAngry's suggestion of using the ~ instead of / for matching URLs.

<?php

$domain_array_new = array();    
$whitelist = array('google.com', 'yahoo.com');

$string = 'subdomain.google.com Lorem yahoo.com Ipsum is simply microsoft.com dummy text www.google.com of the printing and typesetting industry.';

$domain_array = explode(' ', $string);

foreach ($domain_array AS $domain_part) {

    foreach ($whitelist AS $whitelist_domain) {

        if (preg_match('/'.preg_quote($whitelist_domain, '/').'/', $domain_part)) {

            $pattern = '~\b((http(s?)://)|(?=www\.))(\S+)~is';
            $domain_part = preg_replace($pattern, '<a href="$1$4" target="_blank">$1$4</a>', $domain_part);

        }

    }

    $domain_array_new[] = $domain_part;

}

$string = implode(' ', $domain_array_new);

print $string;

Now, this works somewhat, but you need to do some more work on your regular expression. The only URL that it picked up was www.google.com. It did not pick up yahoo.com or subdomain.google.com because those do not have an http(s)? or www in front of them.

EDIT #2:

I played around with this a little bit more and came up with an easier method of doing a find replace instead of breaking it up into an array, processing it and then turning it back into a string.

// YOUR WHITELIST ARRAY
$whitelist = array('google.com', 'yahoo.com', 'microsoft.com');

// TURN YOUR ARRAY INTO AN "OR" STRING TO BE USED FOR MATCHING
$whitelist_matching_string = implode('|', $whitelist);

// DO AN INLINE FIND/REPLACE
$string = preg_replace('~((http(s)?://)?(([-A-Z0-9.]+)?('.$whitelist_matching_string.')(\S+)?))~i', '<a href="http://$4">$1</a>', $string);

print $string;

Let me know if this works better for you.

Upvotes: 1

Related Questions