Reputation:
I've installed a syntax highlighter, but in order for it to work, the tags must be written as <
and >
. What I need to do is replace all <'s with <
and >'s with >
but only inside the PRE tag.
So, in short, I want to escape all HTML characters inside of the pre tag.
Thanks in advance.
Upvotes: 2
Views: 1638
Reputation: 545588
You need to parse the input HTML. Use the DOMDocument
class to represent your document, parse the input, find all <pre>
tags (using findElementsByTagName
) and escape their content.
Unfortunately, the DOM model is very low-level and forces you to iterate the child nodes of the <pre>
tag yourself, to escape them. This looks as follows:
function escapeRecursively($node) {
if ($node instanceof DOMText)
return $node->textContent;
$children = $node->childNodes;
$content = "<$node->nodeName>";
for ($i = 0; $i < $children->length; $i += 1) {
$child = $children->item($i);
$content .= escapeRecursively($child);
}
return "$content</$node->nodeName>";
}
Now this function can be used to escape every <pre>
node in the document:
function escapePreformattedCode($html) {
$doc = new DOMDocument();
$doc->loadHTML($html);
$pres = $doc->getElementsByTagName('pre');
for ($i = 0; $i < $pres->length; $i += 1) {
$node = $pres->item($i);
$children = $node->childNodes;
$content = '';
for ($j = 0; $j < $children->length; $j += 1) {
$child = $children->item($j);
$content .= escapeRecursively($child);
}
$node->nodeValue = htmlspecialchars($content);
}
return $doc->saveHTML();
}
$string = '<h1>Test</h1> <pre>Some <em>interesting</em> text</pre>';
echo escapePreformattedCode($string);
Yields:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body><h1>Test</h1> <pre>Some <em>interesting</em> text</pre></body></html>
Note that a DOM always represents a complete document. Hence when the DOM parser gets a document fragment it fills in the missing information. This makes the output potentially different from the input.
Upvotes: 2