Reputation: 41308
I have some xml that looks like this:
<rootElement attribute=' > '/>
This is accepted as well-formed xml by the parsers I've tried it on, and the relevant part of the RFC also suggests this is valid, although I personally wasn't convinced it was until I checked (interestingly enough this wouldn't be valid if it was a opening triangular brace, but it is as a closing brace).
I have some code that is used to "pretty print" xml - it should only change line-lengths and new lines - it shouldn't change any content. However, no matter how I try to parse this xml, it always ends up being entity replaced:
<rootElement attribute=' > '/>
This isn't entirely unexpected, and any xml parser should treat the two as identical, but for my purposes I don't want this behaviour as this is code meant to change the formatting of an xml file only, not its contents.
It doesn't matter if I load my xml into an XmlDocument:
var xml = "<rootElement attribute=' > '/>";
var doc = new XmlDocument();
doc.LoadXml(xml);
Console.WriteLine(doc.OuterXml);
Or an XElement:
var xElement = XElement.Parse(xml);
xElement.Save(Console.Out);
Or pass it through a reader/writer pair:
using (var ms = new MemoryStream())
using (var streamWriter = new StreamWriter(ms))
{
streamWriter.Write(xml);
streamWriter.Flush();
ms.Position = 0;
using (var xmlReader = XmlReader.Create(ms))
{
xmlReader.Read();
Console.WriteLine(xmlReader.ReadOuterXml());
}
}
They all replace the >
entity with a >
, event though the former is acceptable well-formed xml. I've tried playing with the various XmlReaderSettings
, or XElement's LoadOptions
, etc, but all to no avail.
Does anyone know of any way to prevent this?
This is more of a curiosity than an actual issue, but I am interested to see if anyone has any solutions.
[EDIT to clarify, in the light of some comments/answers]
I really do realise that this behaviour is expected. In my case, maybe I don't want to use one of the built in xml APIs at all (although whatever I use needs to understand the structure of xml so as not to line break in inappropriate places where it changes the semantic meaning of the document.)
I'm really just interested to know if anyone knows of a way to change the behaviour in these parsers (I expect you can't but figured if anyone knew, they'd probably be on SO), or if anyone has any other ideas.
Upvotes: 2
Views: 3410
Reputation: 10988
The interesting thing is that xr.GetAttribute("attribute")
returns " > "
as you would expect. My guess is that in the creation of the XML in the ReadOuterXml
it encodes all >
as >
. So to beat the issue, you would have process each node as it occurred to pretty print it.
Upvotes: 0
Reputation: 1501013
My guess is that you'll find there isn't a way to change this - as I strongly suspect that the internal representation after loading will be the same whether it's originally >
or >
.
Upvotes: 2