Dot Net Dev
Dot Net Dev

Reputation: 614

XElement Parse error when trying to parse string

I am getting xml parse error while trying to parse a string (with CDATA within CDATA)

var cont = "<op><![CDATA[someData<p><![CDATA[someotherData]]></p></op>";
XElement.Parse(cont);

Error:

The 'op' start tag on line 1 position 2 does not match the end tag of 'p'. Line 1, position 52.

Can we have CDATA within CDATA ? If we can, then why am I getting the error.

Below code works fine (It does not contain CDATA within CDATA).

var cont = "<op><![CDATA[someData]]</op>";
XElement.Parse(cont);

Upvotes: 0

Views: 675

Answers (1)

itminus
itminus

Reputation: 25350

1  <op>
2      <![CDATA[
3          someData
4          <p>
5              <![CDATA[someotherData]]>
6          </p>
7  </op>

When the XML Parser encounters the ]]> in line 5 , it will terminate the first <![CDATA[ it met in line 2 . As a result , you can never have nested CDATA within an CDATA.

CDATA is not designed to hold xmlelements , but to hold character data that might contains characteres such as <, > and so on , which allows us to avoid escaping them as &lt; , &gt; respectively , and to write them and display them in a clean way .

So the content between <![CDATA[ and ]] will be treated as plain text , with no further processing , even if it looks like that there's a hierarchy . In other words , they are plain strings . Let's take your code as an example :

var cont = "<op><![CDATA[ <foo><bar></bar></foo> ]]></op>";
var xml=XElement.Parse(cont);

Here the FirstNode of xml will be a plain text foo><bar></bar></foo> , and the FirstNode of the FirstNode will be null.

Since the parser will always treat the data between <![CDATA[ and ]] as a plain string , there's no "standard" closest valid way to represent them . Just encode them and decode them . For example , we can urlencode the data :

string xmlstr= @"<op><![CDATA[
    <helloworld/>
    someData%0A%3Cp%3E%0A%3C!%5BCDATA%5BsomeotherData%5D%5D%3E%0A%3C%2Fp%3E
]]></op>";
var xml = XElement.Parse(xmlstr);

var subxmlString=System.Web.HttpUtility.UrlDecode(xml.Value);
// make sure there' must be a root element
var subxml= XElement.Parse($"<root>${subxmlString}</root>");  

Upvotes: 1

Related Questions