Reputation: 2635
$tags = array(
"applet" => 1,
"script" => 1
);
$html = file_get_contents("test.html");
$dom = new DOMdocument();
@$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$body = $xpath->query("//body")->item(0);
I'm about looping through the "body" of the web page and remove all unwanted tags listed in the $tags array but I can't find a way. So how can I do it?
Upvotes: 3
Views: 2759
Reputation: 1566
Had you considered HTML Purifier? starting with your own html sanitizing is just re-inventing the wheel, and isn't easy to accomplish.
Furthermore, a blacklist approach is also bad, see SO/why-use-a-whitelist-for-html-sanitizing
You may also be interested in reading how to cinfigure allowed tags & attributes or testing HTML Purifier demo
Upvotes: 6
Reputation: 1071
$tags = array(
"applet" => 1,
"script" => 1
);
$html = file_get_contents("test.html");
$dom = new DOMdocument();
@$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
for($i=0; $i<count($tags); ++$i) {
$list = $xpath->query("//".$tags[$i]);
for($j=0; $j<$list->length; ++$j) {
$node = $list->item($j);
if ($node == null) continue;
$node->parentNode->removeChild($node);
}
}
$string = $dom->saveXML();
Something like that.
Upvotes: 4