jglouie
jglouie

Reputation: 12880

Is there a way to disable or modify the strictness of the .NET XML Parser?

I have a slightly bad XML that I'm trying to parse in .NET. This same XML file is consumable by other parsers - that is, they're more tolerant of user error.

The XML looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<kml>
<Document id="12345">
  <name>My name</name>
  <description>My Description</description>
  <myns:author>
    <myns:name>My Name</myns:name>
  </myns:author>
</Document>
</kml>

I load it like this:

XmlDocument doc = new XmlDocument();
doc.Load(myFilePath);

This second line rightfully throws an exception:

'myns' is an undeclared prefix. Line 6, position 4.

From an application point-of-view, we are acting mostly as a conduit to another application that is able to deal with this slightly wrong XML file. We do not want to reject this XML that this 3rd party application is capable of processing.

Is there a way to disable or modify the strictness of the .NET XML Parser?

Upvotes: 1

Views: 328

Answers (3)

Michael Kay
Michael Kay

Reputation: 163342

All the previous answers, surprisingly, are wrong.

Your document is well-formed XML but it is not namespace-well-formed XML. This means that it conforms to the XML recommendation but not to the namespaces-in-XML recommendation. This means you will be able to parse it if you can find a parser that allows namespace processing to be switched off. I don't know if the Microsoft XML parser has such an option, but I don't see one here:

http://msdn.microsoft.com/en-US/library/9khb6435(v=vs.80).aspx

Upvotes: 2

Jon Skeet
Jon Skeet

Reputation: 1500903

Is there a way to disable or modify the strictness of the .NET XML Parser?

Schema validation and things like that are somewhat optional, but this is simply invalid XML. XML parsers generally are this strict, and should be. The fact that the downstream application is capable of handling this is a worrying sign, in itself, IMO.

Options:

  • (Best) Fix whatever's producing the source "XML" - if you're responsible for the code, then just use an XML API. Generally if you write with an XML API, it will do the right thing
  • (Not too bad) Write an intermediate step to fix the bad XML before it goes through your main code. For example, if it's just a matter of the myns namespace prefix being undeclared, you could fix that by declaring it in the root element. You'd probably want to load the file line by line, just changing the second one (the root XML declaration)
  • (Worst, probably) Don't even try to parse it as an XML file. Just treat it as raw text.

Upvotes: 6

harpo
harpo

Reputation: 43168

A conformant XML processor (including the .NET API) does not distinguish between degrees of well-formedness, however "slight." Input is either well-formed, or it's not.

Depending on what you want to do with the document, you have different options for handling it, but all will involve some kind of modification, or System.Xml and company will be of no use here.

Upvotes: 2

Related Questions