Reputation: 614
I am getting xml parse error while trying to parse a string (with CDATA within CDATA)
var cont = "<op><![CDATA[someData<p><![CDATA[someotherData]]></p></op>";
XElement.Parse(cont);
Error:
The 'op' start tag on line 1 position 2 does not match the end tag of 'p'. Line 1, position 52.
Can we have CDATA within CDATA ? If we can, then why am I getting the error.
Below code works fine (It does not contain CDATA within CDATA).
var cont = "<op><![CDATA[someData]]</op>";
XElement.Parse(cont);
Upvotes: 0
Views: 675
Reputation: 25350
1 <op>
2 <![CDATA[
3 someData
4 <p>
5 <![CDATA[someotherData]]>
6 </p>
7 </op>
When the XML Parser encounters the ]]>
in line 5 , it will terminate the first <![CDATA[
it met in line 2 . As a result , you can never have nested CDATA
within an CDATA
.
CDATA
is not designed to hold xml
elements , but to hold character data that might contains characteres such as <
, >
and so on , which allows us to avoid escaping them as <
, >
respectively , and to write them and display them in a clean way .
So the content between <![CDATA[
and ]]
will be treated as plain text , with no further processing , even if it looks like that there's a hierarchy . In other words , they are plain strings . Let's take your code as an example :
var cont = "<op><![CDATA[ <foo><bar></bar></foo> ]]></op>";
var xml=XElement.Parse(cont);
Here the FirstNode
of xml
will be a plain text foo><bar></bar></foo>
, and the FirstNode
of the FirstNode
will be null
.
Since the parser will always treat the data between <![CDATA[
and ]]
as a plain string , there's no "standard" closest valid way to represent them . Just encode them and decode them . For example , we can urlencode the data :
string xmlstr= @"<op><![CDATA[
<helloworld/>
someData%0A%3Cp%3E%0A%3C!%5BCDATA%5BsomeotherData%5D%5D%3E%0A%3C%2Fp%3E
]]></op>";
var xml = XElement.Parse(xmlstr);
var subxmlString=System.Web.HttpUtility.UrlDecode(xml.Value);
// make sure there' must be a root element
var subxml= XElement.Parse($"<root>${subxmlString}</root>");
Upvotes: 1