Reputation: 918
I'm trying to read data in this format:
<?xml version="1.0" encoding="UTF-8"?>
<body>
<![CDATA[sample content]]><br />
<![CDATA[more content]]><br />
<![CDATA[content]]><br /></body>
the data comes from a remote xml file so I cannot alter it. I've trying to read this with php using
$file = file_get_contents($r[0]->overview);
$xml = new SimpleXMLElement($file);
echo '<pre>';
print_r($xml);
echo '</pre>';
This outputs:
SimpleXMLElement Object
(
[br] => Array
(
[0] => SimpleXMLElement Object
(
)
[1] => SimpleXMLElement Object
(
)
[2] => SimpleXMLElement Object
(
)
)
)
I'm unsure how to read the contents, normally I could see an array or object I can loop through.
Any advice would be appreciated.
Upvotes: 1
Views: 1808
Reputation: 19502
The problem is just SimpleXMLs magic. CDATA sections are a special kind of text nodes, they allow to write the special characters in XML without the encoding (<, >, ", '). This has two reasons: backwards compatibility for the script elements and better human readability.
They are still nodes and can be read as such:
<?php
$xml = <<<'XML'
<?xml version="1.0" encoding="UTF-8"?>
<body>
<![CDATA[sample content]]><br />
<![CDATA[more content]]><br />
<![CDATA[content]]><br /></body>
XML;
$dom = new DOMDocument();
$dom->loadXml($xml);
$xpath = new DOMXpath($dom);
// iterate all text child nodes that are not just whitespaces
foreach($xpath->evaluate('/body/text()[normalize-space(.) != ""]') as $node) {
var_dump($xpath->evaluate('string(.)', $node));
}
Output: https://eval.in/140237
string(14) "sample content"
string(12) "more content"
string(7) "content"
Upvotes: 1
Reputation: 68556
The <![CDATA[sample content]]>
should be encased in a opening and a closing tag , only then the data can be retrieved. Also, to read the CDATA
content , you should use the LIBXML_NOCDATA
parameter.
Since those CDATA
did not have any proper encasement you were getting the empty array.
<?php
$content = '<?xml version="1.0" encoding="UTF-8"?>
<body>
<![CDATA[sample content]]><br />
<![CDATA[more content]]><br />
<![CDATA[content]]><br /></body>';
$content = str_replace(array('<br />','<!',']>'),array('','<br><!',']></br>'),$content);
$xml = simplexml_load_string($content, 'SimpleXMLElement', LIBXML_NOCDATA | LIBXML_NOBLANKS);
print_r($xml);
OUTPUT:
SimpleXMLElement Object
(
[br] => Array
(
[0] => sample content
[1] => more content
[2] => content
)
)
Upvotes: 0