Reputation: 1067
I have html content stored in a database table. in that html content I want to replace "SOME WORDS" with a link tag. But if "SOME WORDS" is already inside a link tag i should omit them..
e.g.
The content
<p>Lorem ipsum dolor SOME WORDS, consectetur adipiscing elit. <a href="http://example.com">SOME WORDS</a> elementum pharetra velit at cursus. Quisque blandit, nibh at eleifend ullamcorper</p>
The output should be
<p>Lorem ipsum dolor <a href="http://someurl">SOME WORDS</a>, consectetur adipiscing elit. <a href="http://example.com">SOME WORDS</a> elementum pharetra velit at cursus. Quisque blandit, nibh at eleifend ullamcorper</p>
as you can see, it should exclude existing link texts when replacing.
Some guidance to get in to the right track is very much appreciated.
Upvotes: 5
Views: 899
Reputation: 173562
This is how you could solve it using DOMDocument
instead of regular expressions:
$contents = <<<EOS
<p>Lorem ipsum dolor SOME WORDS, consectetur adipiscing elit. <a href="http://example.com">SOME WORDS</a> elementum pharetra velit at cursus. Quisque blandit, nibh at eleifend ullamcorper</p>
EOS;
$doc = new DOMDocument;
libxml_use_internal_errors(true);
$doc->loadHTML($contents);
libxml_clear_errors();
$xp = new DOMXPath($doc);
// find all text nodes
foreach ($xp->query('//text()') as $node) {
// make sure it's not inside an anchor
if ($node->parentNode->nodeName !== 'a') {
$node->nodeValue = str_replace(
'SOME WORDS',
'SOME OTHER WORDS',
$node->nodeValue
);
}
}
// DOMDocument creates a full document and puts your fragment inside a body tag
// So we enumerate the children and save their HTML representation
$body = $doc->getElementsByTagName('body')->item(0);
foreach ($body->childNodes as $node) {
echo $doc->saveHTML($node);
}
Upvotes: 3
Reputation: 10169
If you have room for 3 lines this would be a safe bet:
$text=preg_replace('~<a(.*)(SOME WORDS)(.*)</a>~','<a$1PLACEHOLDER$3</a>',$text);
$text=preg_replace('~SOME WORDS~','REPLACEMENT WORDS',$text);
$text=preg_replace('~PLACEHOLDER~','SOME WORDS',$text);
It will use a PLACEHOLDER text/tag/whatever so you don't replace a link contents (in case there is one).
Upvotes: 1
Reputation: 3362
Simple regex will only work if it's the exact phrase and inside the link without any other symbols or words. You could iterate through all occurrences of SOME WORDS
to see if they are inside a link by calculating the amount of times there was an opening and closing link tag before the occurrence. Try this code:
$str = '<p>Lorem ipsum dolor SOME WORDS, consectetur adipiscing elit. <a href="http://example.com">SOME WORDS</a> elementum pharetra velit at cursus. Quisque blandit, nibh at eleifend ullamcorper</p>';
echo 'Before:' . $str;
$str_lc = strtolower($str);
$phrase = 'SOME WORDS';
$link = '<a href="http://someurl">SOME WORDS</a>';
$offset = 0;
while($position = strpos($str, $phrase, $offset))
{
if (substr_count($str_lc, "<a", 0, $position) <= substr_count($str_lc, "</a>", 0, $position)) {
$str = substr_replace($str, $link, $position, strlen($phrase));
$str_lc = strtolower($str);
$offset = $position + strlen($link) - strlen($phrase);
} else {
$offset = $position + 1;
}
}
echo 'After:' . $str;
Upvotes: 1
Reputation: 1006
This should do the trick.
Just check in the regex if SOME WORDS is surrounded by tags
preg_replace('/[^>]SOME WORDS[^<]/','<a href="http://someurl">SOME WORDS</a>',$str);
Upvotes: 0