Dave
Dave

Reputation: 918

Reading data from a xml file inside HTML CDATA with PHP

I'm trying to read data in this format:

<?xml version="1.0" encoding="UTF-8"?>
<body>
<![CDATA[sample content]]><br />
<![CDATA[more content]]><br />
<![CDATA[content]]><br /></body>

the data comes from a remote xml file so I cannot alter it. I've trying to read this with php using

$file = file_get_contents($r[0]->overview);
$xml = new SimpleXMLElement($file); 

echo '<pre>';
print_r($xml);
echo '</pre>';

This outputs:

SimpleXMLElement Object
(
[br] => Array
    (
        [0] => SimpleXMLElement Object
            (
            )

        [1] => SimpleXMLElement Object
            (
            )

        [2] => SimpleXMLElement Object
            (
            )

    )

)

I'm unsure how to read the contents, normally I could see an array or object I can loop through.

Any advice would be appreciated.

Upvotes: 1

Views: 1808

Answers (2)

ThW
ThW

Reputation: 19502

The problem is just SimpleXMLs magic. CDATA sections are a special kind of text nodes, they allow to write the special characters in XML without the encoding (<, >, ", '). This has two reasons: backwards compatibility for the script elements and better human readability.

They are still nodes and can be read as such:

<?php

$xml = <<<'XML'
<?xml version="1.0" encoding="UTF-8"?>
<body>
<![CDATA[sample content]]><br />
<![CDATA[more content]]><br />
<![CDATA[content]]><br /></body>
XML;

$dom = new DOMDocument();
$dom->loadXml($xml);

$xpath = new DOMXpath($dom);

// iterate all text child nodes that are not just whitespaces
foreach($xpath->evaluate('/body/text()[normalize-space(.) != ""]') as $node) {
  var_dump($xpath->evaluate('string(.)', $node));
}

Output: https://eval.in/140237

string(14) "sample content"
string(12) "more content"
string(7) "content"

Upvotes: 1

The <![CDATA[sample content]]> should be encased in a opening and a closing tag , only then the data can be retrieved. Also, to read the CDATA content , you should use the LIBXML_NOCDATA parameter.

Since those CDATA did not have any proper encasement you were getting the empty array.

The fixed code..

<?php

$content = '<?xml version="1.0" encoding="UTF-8"?>
<body>
<![CDATA[sample content]]><br />
<![CDATA[more content]]><br />
<![CDATA[content]]><br /></body>';

$content = str_replace(array('<br />','<!',']>'),array('','<br><!',']></br>'),$content);
$xml = simplexml_load_string($content, 'SimpleXMLElement', LIBXML_NOCDATA | LIBXML_NOBLANKS);
print_r($xml);

OUTPUT:

SimpleXMLElement Object
(
    [br] => Array
        (
            [0] => sample content
            [1] => more content
            [2] => content
        )

)

Upvotes: 0

Related Questions