gtilflm
gtilflm

Reputation: 1465

Issue with Output from DOMDocument

I'm using DOMDocument to find and strip out some HTML elements I don't want in the PHP variable $table_data_for_db. The raw version of $table_data_for_db comes with some HTML tags I don't want. So, I'm using the code below to get rid of some tags (and the content in those tags), then saving the remaining HTML to my DB.

Here's the code I'm using to create $table_data_for_db...

    $table_data_for_db = $_POST['table_data'];

    $dom = new DOMDocument;
    $dom->loadHTML($table_data_for_db);
    $xPath = new DOMXPath($dom);
    $nodes = $xPath->query('//*[@id="problem_header"]');
    if($nodes->item(0)) {
        $nodes->item(0)->parentNode->removeChild($nodes->item(0));
    }
    $nodes = $xPath->query('//*[@id="border_row"]');
    if($nodes->item(0)) {
        $nodes->item(0)->parentNode->removeChild($nodes->item(0));
    }
    $nodes = $xPath->query('//*[@id="fraction_class"]');
    if($nodes->item(0)) {
        $nodes->item(0)->parentNode->removeChild($nodes->item(0));
    }

    $table_data_for_db = $dom->saveHTML();

The problem is that I'm getting output like this... More of the same...

Where the <!DOCTYPE html..., <html><head> and </head></html> are undesirable.

I currently have a solution in place where I use str_replace to get rid of the undesirables before inserting into the DB, but that feels like a hack. Is there a better way to do this?

Upvotes: 0

Views: 48

Answers (1)

kojow7
kojow7

Reputation: 11384

Why did you delete your other post? If you wanted to change your question, just use the edit function. Anyhow my answer to your other one is as follows:

It is the saveHTML function that is putting in the extra code. To make sure it does not put it in, use this for your loadHTML function:

$dom->loadHTML($table_data_for_db, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);

According to http://php.net/manual/en/libxml.constants.php you will need at least versions: PHP 5.4 and Libxml 2.7.8

Upvotes: 1

Related Questions