floatleft
floatleft

Reputation: 6541

PHP Regular expression: exclude href anchor tags

I'm creating a simple search for my application.

I'm using PHP regular expression replacement (preg_replace) to look for a search term (case insensitive) and add <strong> tags around the search term.

preg_replace('/'.$query.'/i', '<strong>$0</strong>', $content);

Now I'm not the greatest with regular expressions. So what would I add to the regular expression to not replace search terms that are in a href of an anchor tag?

That way if someone searched "info" it wouldn't change a link to "http://something.com/this_<strong>info</strong>/index.html"

Upvotes: 6

Views: 2390

Answers (3)

anubhava
anubhava

Reputation: 784968

I believe you will need conditional subpatterns] for this purpose:

$query = "link";
$query = preg_quote($query, '/');

$p = '/((<)(?(2)[^>]*>)(?:.*?))*?(' . $query . ')/smi';
$r = "$1<strong>$3</strong>";

$str = '<a href="/Link/foo/the_link.htm">'."\n".'A Link</a>'; // multi-line text
$nstr = preg_replace($p, $r,  $str);
var_dump( $nstr );

$str = 'Its not a Link'; // non-link text
$nstr = preg_replace($p, $r,  $str);
var_dump( $nstr );

Output: (view source)

string(61) "<a href="/Link/foo/the_link.htm"> 
A <strong>Link</strong></a>"
string(31) "Its not a <strong>Link</strong>"

PS: Above regex also takes care of multi-line replacement and more importantly it ignores matching not only href but any other HTML entity enclosed in < and >.

EDIT: If you just want to exclude hrefs and not all html entities then use this pattern instead of above in my answer:

$p = '/((<)(?(2).*?href=[^>]*>)(?:.*?))*?(' . $query . ')/smi';

Upvotes: 1

Jan Turoň
Jan Turoň

Reputation: 32912

You may use conditional subpatterns, see explanation here: http://cz.php.net/manual/en/regexp.reference.conditional.php

preg_replace("/(?(?<=href=\")([^\"]*\")|($query))/i","\\1<strong>\\2</strong>",$x);

In your case, if you have whole HTML, not just href="", there is an easier solution using 'e' modifier, which enables you using PHP code in replacing matches

function termReplacer($found) {
  $found = stripslashes($found);
  if(substr($found,0,5)=="href=") return $found;
  return "<strong>$found</strong>";
}
echo preg_replace("/(?:href=)?\S*$query/e","termReplacer('\\0')",$x);

See example #4 here http://cz.php.net/manual/en/function.preg-replace.php If your expression is even more complex, you can use regExp even inside termReplacer().

There is a minor bug in PHP: the $found parameter in termReplacer() needs to be stripslashed!

Upvotes: 0

George
George

Reputation: 901

I'm not 100% what you are ultimately after here, but from what I can, it's a sort of "search phrase" highlighting facility, which highlights keywords so to speak. If so, I suggest having a look at the Text Helper in CodeIgniter. It provides a nice little function called highlight_phrase and this could do what you are looking for.

The function is as follows.

function highlight_phrase($str, $phrase, $tag_open = '<strong>', $tag_close = '</strong>')
{
    if ($str == '')
    {
        return '';
    }

    if ($phrase != '')
    {
        return preg_replace('/('.preg_quote($phrase, '/').')/i', $tag_open."\\1".$tag_close, $str);
    }

    return $str;
}

Upvotes: 0

Related Questions