Reputation: 55563
I have to deal with XML documents1 which contain certain textual data (such as error messages) represented using CDATA
blocks containing \xNN
-style escapes for non-ASCII characters of a single-byte character set (Windows-1251 in my case).
An example is
<TEXT><![CDATA[\xd1\xe0\xe9\xf2 \xe2\xf0\xe5\xec\xe5\xed\xed\xee ...]]></TEXT>
where each \xNN
bit hex-encodes a single character of the Windows-1251 repertoire.
What I fail to gather from the XML 1.0 specification, is what the semantics of whatever is contained in a CDATA
block, other than it is "character data". So, the question is: is an XML parser supposed to process those backslash-escapes when parsing such CDATA
blocks, taking into account the encoding it parsed out from the document's XMLDecl
or the presense of such an encoding of the character data has no meaning in XML itself, and a parser is supposed to return whatever it extracted from a CDATA
block "as is", and it's up to me to further decode them?
1 payloads of SOAP responses generated by the web service of the Amadeus E-Retail product.
Upvotes: 0
Views: 607
Reputation: 944306
It's up to you to decode them. The \
character does not represent an escape sequence in XML (and even if it did, CDATA would almost certainly be designed to make that character literal as well).
Upvotes: 3