Reputation: 1465
I'm using DOMDocument
to find and strip out some HTML elements I don't want in the PHP variable $table_data_for_db
. The raw version of $table_data_for_db
comes with some HTML tags I don't want. So, I'm using the code below to get rid of some tags (and the content in those tags), then saving the remaining HTML to my DB.
Here's the code I'm using to create $table_data_for_db
...
$table_data_for_db = $_POST['table_data'];
$dom = new DOMDocument;
$dom->loadHTML($table_data_for_db);
$xPath = new DOMXPath($dom);
$nodes = $xPath->query('//*[@id="problem_header"]');
if($nodes->item(0)) {
$nodes->item(0)->parentNode->removeChild($nodes->item(0));
}
$nodes = $xPath->query('//*[@id="border_row"]');
if($nodes->item(0)) {
$nodes->item(0)->parentNode->removeChild($nodes->item(0));
}
$nodes = $xPath->query('//*[@id="fraction_class"]');
if($nodes->item(0)) {
$nodes->item(0)->parentNode->removeChild($nodes->item(0));
}
$table_data_for_db = $dom->saveHTML();
The problem is that I'm getting output like this... More of the same...
Where the <!DOCTYPE html...
, <html><head>
and </head></html>
are undesirable.
I currently have a solution in place where I use str_replace
to get rid of the undesirables before inserting into the DB, but that feels like a hack. Is there a better way to do this?
Upvotes: 0
Views: 48
Reputation: 11384
Why did you delete your other post? If you wanted to change your question, just use the edit function. Anyhow my answer to your other one is as follows:
It is the saveHTML function that is putting in the extra code. To make sure it does not put it in, use this for your loadHTML function:
$dom->loadHTML($table_data_for_db, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
According to http://php.net/manual/en/libxml.constants.php you will need at least versions: PHP 5.4 and Libxml 2.7.8
Upvotes: 1