Istrebitel
Istrebitel

Reputation: 3093

Is there a way to make XmlDocument parsing less strict

I am making a program that will store its data in an XML file. When people write XML they can make subtle mistakes, like ending a comment with - so it looks like <!-- comment ---> or adding a </>inside an attribute. Naturally, the XML still can be read all right, but trying to input this text into XmlDocument will give a syntax error (and it wont be parsed).

Is there a way to make XmlDocument less strict and make it ignore violations of the standard that do not make the document unparseable? For example, its clear that <!-- comment ---> is still a comment even though it contains - at the end which is against the standard specification).

Upvotes: 1

Views: 1219

Answers (2)

voidengine
voidengine

Reputation: 2579

No, XML parsers are expected to reject input that is not valid XML.

You may try your luck preprocessing the invalid files by Tidy, but better simply make sure the input is valid.

Here's an example usage. Tidy will fix your comments and do some escaping, but an extra opening < will break things up more often than not - guessing in that case is simply too much to ask.

Tidy tidy = new Tidy();
tidy.Options.FixComments = true;
tidy.Options.XmlTags = true;
tidy.Options.XmlOut = true;

string invalid = "<root>< <!--comment--->></root>";
MemoryStream input = new MemoryStream(Encoding.UTF8.GetBytes(invalid));
MemoryStream output = new MemoryStream();
tidy.Parse(input, output, new TidyMessageCollection());
// TODO check the messages

string repaired = Encoding.UTF8.GetString(output.ToArray());

Upvotes: 1

Henk Holterman
Henk Holterman

Reputation: 273274

No, and that's a good thing.

XML is a strict format, the solution here is to have correct (corrected) input.

All XML tools are very picky, by design. You might have some luck with an XMLReeader and fixing or rejecting faulty elements.

But it's far better to create the XML with a suitable tool. Quite a few of them are named XmlPad

Upvotes: 6

Related Questions