Reputation: 6726
My C# application loads XML documents using the following code:
XmlDocument doc = new XmlDocument();
doc.Load(path);
Some of these documents contain encoded characters, for example:
<xsl:text> </xsl:text>
I notice that when these documents are loaded,
gets dropped.
My question: How can I preserve <xsl:text> </xsl:text>
?
FYI - The XML declaration used for these documents:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
Upvotes: 1
Views: 866
Reputation: 415665
Are you sure the character is dropped? character 10 is just a line feed- it wouldn't exactly show up in your debugger window. It could also be treated as whitespace. Have you tried playing with the whitespace settings on your xmldocument?
If you need to preserve the encoding you only have two choices: a CDATA section or reading as plain text rather than Xml. I suspect you have absolutely 0 control over the documents that come into the system, therefore eliminating the CDATA option.
Plain-text rather than Xml is probably distasteful as well, but it's all you have left. If you need to do validation or other processing you could first load and verify the xml, and then concatenate your files using simple file streams as a separate step. Again: not ideal, but it's all that's left.
Upvotes: 2
Reputation: 100248
maybe it would be better to keep data in ![CDATA] ?
http://www.w3schools.com/XML/xml_cdata.asp
Upvotes: 0
Reputation: 1500065
is a linefeed - i.e. whitespace. The XML parser will load it in as a linefeed, and thereafter ignore the fact that it was originally encoded. The encoding is just part of the serialization of the data to text format - it's not part of the data itself.
Now, XML sometimes ignores whitespace and sometimes doesn't, depending on context, API etc. As Joel says you may find that it's not missing at all - or you may find that using it with an API which allows you to preserve whitespace fixes the problem. I wouldn't be at all surprised to see it turned into an unencoded linefeed character when you output the data though.
Upvotes: 1