EastSw
EastSw

Reputation: 1067

php regular expression to replace "some words" with a link tag, but should exclude "some words" inside link tags

I have html content stored in a database table. in that html content I want to replace "SOME WORDS" with a link tag. But if "SOME WORDS" is already inside a link tag i should omit them..

e.g.
The content

<p>Lorem ipsum dolor SOME WORDS, consectetur adipiscing elit. <a href="http://example.com">SOME WORDS</a> elementum pharetra velit at cursus. Quisque blandit, nibh at eleifend ullamcorper</p>

The output should be

<p>Lorem ipsum dolor <a href="http://someurl">SOME WORDS</a>, consectetur adipiscing elit. <a href="http://example.com">SOME WORDS</a> elementum pharetra velit at cursus. Quisque blandit, nibh at eleifend ullamcorper</p>

as you can see, it should exclude existing link texts when replacing.

Some guidance to get in to the right track is very much appreciated.

Upvotes: 5

Views: 899

Answers (4)

Ja͢ck
Ja͢ck

Reputation: 173562

This is how you could solve it using DOMDocument instead of regular expressions:

$contents = <<<EOS
<p>Lorem ipsum dolor SOME WORDS, consectetur adipiscing elit. <a href="http://example.com">SOME WORDS</a> elementum pharetra velit at cursus. Quisque blandit, nibh at eleifend ullamcorper</p>
EOS;

$doc = new DOMDocument;
libxml_use_internal_errors(true);
$doc->loadHTML($contents);
libxml_clear_errors();

$xp = new DOMXPath($doc);

// find all text nodes
foreach ($xp->query('//text()') as $node) {
        // make sure it's not inside an anchor
        if ($node->parentNode->nodeName !== 'a') {
                $node->nodeValue = str_replace(
                    'SOME WORDS', 
                    'SOME OTHER WORDS', 
                    $node->nodeValue
                );
        }
}
// DOMDocument creates a full document and puts your fragment inside a body tag
// So we enumerate the children and save their HTML representation
$body = $doc->getElementsByTagName('body')->item(0);
foreach ($body->childNodes as $node) {
        echo $doc->saveHTML($node);
}

Upvotes: 3

CSᵠ
CSᵠ

Reputation: 10169

If you have room for 3 lines this would be a safe bet:

$text=preg_replace('~<a(.*)(SOME WORDS)(.*)</a>~','<a$1PLACEHOLDER$3</a>',$text);
$text=preg_replace('~SOME WORDS~','REPLACEMENT WORDS',$text);
$text=preg_replace('~PLACEHOLDER~','SOME WORDS',$text);

It will use a PLACEHOLDER text/tag/whatever so you don't replace a link contents (in case there is one).

Upvotes: 1

Ranty
Ranty

Reputation: 3362

Simple regex will only work if it's the exact phrase and inside the link without any other symbols or words. You could iterate through all occurrences of SOME WORDS to see if they are inside a link by calculating the amount of times there was an opening and closing link tag before the occurrence. Try this code:

$str = '<p>Lorem ipsum dolor SOME WORDS, consectetur adipiscing elit. <a href="http://example.com">SOME WORDS</a> elementum pharetra velit at cursus. Quisque blandit, nibh at eleifend ullamcorper</p>';
echo 'Before:' . $str;
$str_lc = strtolower($str);
$phrase = 'SOME WORDS';
$link = '<a href="http://someurl">SOME WORDS</a>';
$offset = 0;
while($position = strpos($str, $phrase, $offset))
{
    if (substr_count($str_lc, "<a", 0, $position) <= substr_count($str_lc, "</a>", 0, $position)) {
        $str = substr_replace($str, $link, $position, strlen($phrase));
        $str_lc = strtolower($str);
        $offset = $position + strlen($link) - strlen($phrase);
    } else {
        $offset = $position + 1;
    }
}
echo 'After:' . $str;

Upvotes: 1

ashiina
ashiina

Reputation: 1006

This should do the trick.

Just check in the regex if SOME WORDS is surrounded by tags

preg_replace('/[^>]SOME WORDS[^<]/','<a href="http://someurl">SOME WORDS</a>',$str);

Upvotes: 0

Related Questions