Reputation: 3093
I am making a program that will store its data in an XML file. When people write XML they can make subtle mistakes, like ending a comment with -
so it looks like <!-- comment --->
or adding a </>
inside an attribute. Naturally, the XML still can be read all right, but trying to input this text into XmlDocument will give a syntax error (and it wont be parsed).
Is there a way to make XmlDocument less strict and make it ignore violations of the standard that do not make the document unparseable? For example, its clear that <!-- comment --->
is still a comment even though it contains -
at the end which is against the standard specification).
Upvotes: 1
Views: 1219
Reputation: 2579
No, XML parsers are expected to reject input that is not valid XML.
You may try your luck preprocessing the invalid files by Tidy, but better simply make sure the input is valid.
Here's an example usage. Tidy will fix your comments and do some escaping, but an extra opening < will break things up more often than not - guessing in that case is simply too much to ask.
Tidy tidy = new Tidy();
tidy.Options.FixComments = true;
tidy.Options.XmlTags = true;
tidy.Options.XmlOut = true;
string invalid = "<root>< <!--comment--->></root>";
MemoryStream input = new MemoryStream(Encoding.UTF8.GetBytes(invalid));
MemoryStream output = new MemoryStream();
tidy.Parse(input, output, new TidyMessageCollection());
// TODO check the messages
string repaired = Encoding.UTF8.GetString(output.ToArray());
Upvotes: 1
Reputation: 273274
No, and that's a good thing.
XML is a strict format, the solution here is to have correct (corrected) input.
All XML tools are very picky, by design. You might have some luck with an XMLReeader and fixing or rejecting faulty elements.
But it's far better to create the XML with a suitable tool. Quite a few of them are named XmlPad
Upvotes: 6