Reputation:
I am extracting data from an XML and some tags have data inside CDATA in this way
<description><![CDATA[Changes (as compared to 8.17) include:
Features:
* Added a ‘Schema Optimizer’ feature. Based on “procedure analyse()” it will propose alterations to data types for a table based on analysis on what data are stored in the table. The feature is available from INFO tab/HTML mode. Refer to documentation for details.
* A table can now be added [...]]]>
</description>
I am already using preq_match to extract data from description tag.So How can I extract data from CDATA?
Upvotes: 1
Views: 2537
Reputation: 12658
@Pavel Minaev is right keep the option of regular expression as a last resort, and for xml always use Xml parser you can find the xml parser now in almost all languages. e.g. I usually use DOMDocument to parse or create xml in php. Its really simple and easy to understand specially for people like me who use php occasionally.
e.g you like to extract CDATA from following xml
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE message SYSTEM "https://www.abcd.com/dtds/AbcdefMessageXmlApi.dtd">
<message id="9002">
<report>
<![CDATA[id:50121515075540159 sub:001 text text text text text]]>
</report>
<number>353874181931</number>
</message>
Use following code to extract the CDATA
$doc = new DOMDocument();
$doc->preserveWhiteSpace = false;
if (TRUE != $doc->loadXML($xml_response)) {
// log error and / or throw expection or whatever
}
$response_element = $doc->documentElement;
if($response_element->tagName == "message"){
$report_node = $response_element->getElementsByTagName("report");
if($report_node != null && $report_node->length == 1) {
$narrative = $report_node->item(0)->textContent;
$log->debug("CDATA: $narrative");
} else {
$log->error("unable to find report tag or multiple report tag found in response xml");
}
} else {
$log->error("unexpected root tag (" . $response_element->tagName .") in response xml");
}
after execution of this $narrative
variable should have all the text, and don't worry it will not contain the ugly tag part CDATA.
Happy coding :)
Upvotes: 0
Reputation: 27313
you should use simple_xml and xpath
if you need to extract a complex set of data.
<?php
$string = <<<XML
<?xml version='1.0'?>
<document>
<title>Forty What?</title>
<from>Joe</from>
<to>Jane</to>
<body>
I know that's the answer -- but what's the question?
</body>
</document>
XML;
$xml = simplexml_load_string($string);
var_dump($xml);
?>
would provide output like this :
SimpleXMLElement Object
(
[title] => Forty What?
[from] => Joe
[to] => Jane
[body] =>
I know that's the answer -- but what's the question?
)
so in your case you would just to navigate inside your document really more easy then reg expressions, isn't it?
Upvotes: 0
Reputation: 101565
Regardless of the language, don't use regular expressions to parse XML - you will almost certainly get it wrong. Use an XML parser.
Upvotes: 7