kostix
kostix

Reputation: 55563

What are the semantics of \xNN escapes inside XML CDATA blocks?

I have to deal with XML documents1 which contain certain textual data (such as error messages) represented using CDATA blocks containing \xNN-style escapes for non-ASCII characters of a single-byte character set (Windows-1251 in my case).

An example is

<TEXT><![CDATA[\xd1\xe0\xe9\xf2 \xe2\xf0\xe5\xec\xe5\xed\xed\xee ...]]></TEXT>

where each \xNN bit hex-encodes a single character of the Windows-1251 repertoire.

What I fail to gather from the XML 1.0 specification, is what the semantics of whatever is contained in a CDATA block, other than it is "character data". So, the question is: is an XML parser supposed to process those backslash-escapes when parsing such CDATA blocks, taking into account the encoding it parsed out from the document's XMLDecl or the presense of such an encoding of the character data has no meaning in XML itself, and a parser is supposed to return whatever it extracted from a CDATA block "as is", and it's up to me to further decode them?


1 payloads of SOAP responses generated by the web service of the Amadeus E-Retail product.

Upvotes: 0

Views: 607

Answers (1)

Quentin
Quentin

Reputation: 944306

It's up to you to decode them. The \ character does not represent an escape sequence in XML (and even if it did, CDATA would almost certainly be designed to make that character literal as well).

Upvotes: 3

Related Questions