some guy
some guy

Reputation: 722

How to stop PHP Domdocument::SaveXML from inserting "CDATA"?

I'm using PHP to get all the "script" tags from web pages, and then appending text after the </script> that is not always valid html. Because it's not always valid markup I can't just use appendchild/replacechild to add that information, unless I'm misunderstanding how replacechild works.

Anyway, when I do

$script_tags = $doc->getElementsByTagName('script');
$l = $script_tags->length;
for ($i = $l - 1; $i > -1; $i--)
$script_tags_string = $doc->saveXML($script_tags->item($i));

This puts "<![CDATA[" and "]]>" around the contents of the script tag. How can I disable this? Please don't tell me to just delete it afterwards, that's what I'm going to do if I can't find a solution for this.

Upvotes: 3

Views: 1869

Answers (2)

Federico Hoerth
Federico Hoerth

Reputation: 275

One way I've found to fix this:

Before echoing the document, make a loop around all script tags, and use str_replace for "<", ">" to some string, make sure to only use that string inside script tags. Then, use the method saveXML() in a variable, and finally use str_replace replacing "STRING" to "<" or ">"

Here is the code:

<?php
    //First loop
    foreach($dom->getElementsByTagName('script') as $script){
        $script->nodeValue = str_replace("<", "ESCAPE_CHAR_LT", $script->nodeValue);
        $script->nodeValue = str_replace(">", "ESCAPE_CHAR_GT", $script->nodeValue);
    }

    //Obtaining XHTML
    $output = $dom->saveXML();

    //Seccond replace
    $output = str_replace("ESCAPE_CHAR_LT", "<", $output);
    $output = str_replace("ESCAPE_CHAR_GT", ">", $output);

    //Print document
    echo $output;
?>

As you can see, now you are free to use "<" ">" in your scripts.

Hope this helps someone.

Upvotes: 0

Jani Hartikainen
Jani Hartikainen

Reputation: 43253

I have a suspicion that the CDATA is inserted because it would otherwise be invalid XML.

Have you tried using saveHTML instead of saveXML?

Upvotes: 3

Related Questions