Reputation: 2632
I have this kind of HTML document.
<span class="class1">text1</span>
<a href="">link1</a>
<font color=""><b>text2</b></font>
<a href="">link2</a>
text3
<span class="class2">text4</span>
And I'd like to surround text1, text2 and text3 by
s. What would be the best way? DomDocument cannot catch strings that are not tagged. For text1 and text2, getElementByTagName('tagname')->item(0)
can be used but for text 3, I'm not sure what to do.
Any ideas?
[Edit]
As Musa suggests, I tried using nextSibling.
<?php
$html = <<<STR
<span class="class1">text1</span>
<a href="">link1</a>
<font color=""><b>text2</b></font>
<a href="">link2</a>
text3
<span class="class2">text4</span>
STR;
$doc = new DOMDocument;
$doc->loadHTML($html);
foreach ($doc->getElementsByTagName('a') as $nodeA) {
$nodeA->nextSibling->nodeValue = ' ' . $nodeA->nextSibling->nodeValue . ' ';
}
echo $doc->saveHtml();
?>
However,
gets escaped and converted to &nbsp;
Upvotes: 2
Views: 6488
Reputation: 97672
Since the setting the value seems to set it as text and not html you could use the non-breaking space character instead of the html entity.
<?php
$html = <<<STR
<span class="class1">text1</span>
<a href="">link1</a>
<font color=""><b>text2</b></font>
<a href="">link2</a>
text3
<span class="class2">text4</span>
STR;
$nbsp = "\xc2\xa0";
$doc = new DOMDocument;
$doc->loadHTML('<div>' . $html . '</div>');
foreach( $doc->getElementsByTagName('div')->item(0)->childNodes as $node ) {
if ($node->nodeType == 3) { // nodeType:3 TEXT_NODE
$node->nodeValue = $nbsp . $node->nodeValue . $nbsp;
}
}
echo $doc->saveHtml();
?>
Upvotes: 4
Reputation: 2632
One solution I came up with:
<?php
$html = <<<STR
<span class="class1">text1</span>
<a href="">link1</a>
<font color=""><b>text2</b></font>
<a href="">link2</a>
text3
<span class="class2">text4</span>
STR;
$doc = new DOMDocument;
$doc->loadHTML('<div>' . $html . '</div>');
foreach( $doc->getElementsByTagName('div')->item(0)->childNodes as $node ) {
if ($node->nodeType == 3) { // nodeType:3 TEXT_NODE
$node->nodeValue = '[identical_replacement_string]' . $node->nodeValue . '[identical_replacement_string]';
}
}
$output = str_replace("[identical_replacement_string]", " ", $doc->saveHtml());
echo $output;
?>
Please feel free to post better solutions.
Upvotes: 0
Reputation: 39872
You should be able to use getElementsByTagName
and then iterate over the node list, adding
as necessary.
getElementsByTagName('body')
http://php.net/manual/en/domdocument.getelementsbytagname.php
will return a nodelist
http://www.php.net/manual/en/class.domnodelist.php
which you can then iterate over the individual items
http://www.php.net/manual/en/domnodelist.item.php
the nodeType will let you know what you are dealing with. Text3 is a TEXT_NODE which has a value of 3
Hope that gets you going in the right direction.
Upvotes: 2