Reed
Reed

Reputation: 1642

C# Parsing XML in ISO-8859-1

I'm working on a tool for validating XML files grabbed from a mainframe. For reasons beyond my control every XML file is encoded in ISO 8859-1.

<?xml version="1.0" encoding="ISO 8859-1"?>

My C# application utilizes the System.XML library to parse the XML and eventually a string of a message contained within one of the child nodes.

If I manually remove the XML encoding line it works just fine. But i'd like to find a solution that doesn't require manual intervention. Are there any elegant approaches to solving this? Thanks in advance.

The exception that is thrown reads as:

System.Xml.XmlException' occurred in System.Xml.dll. System does not support 'ISO 8859-1' encoding. Line 1, position 31

My code is

XMLDocument xmlDoc = new XMLDocument();
xmlDoc.Load(//fileLocation);

Upvotes: 1

Views: 6904

Answers (1)

Jeppe Stig Nielsen
Jeppe Stig Nielsen

Reputation: 62002

As Jeroen pointed out in a comment, the encoding should be:

<?xml version="1.0" encoding="ISO-8859-1"?>

not:

<?xml version="1.0" encoding="ISO 8859-1"?>

(missing dash -).

You can use a StreamReader with an explicit encoding to read the file anyway:

using (var reader = new StreamReader("//fileLocation", Encoding.GetEncoding("ISO-8859-1")))
{
  var xmlDoc = new XmlDocument();
  xmlDoc.Load(reader);
  // ...
}

(from answer by competent_tech in other thread I linked in an earlier comment).

If you do not want the using statement, I guess you can do:

var xmlDoc = new XmlDocument();
xmlDoc.LoadXml(File.ReadAllText("//fileLocation", Encoding.GetEncoding("ISO-8859-1")));

Instead of XmlDocument, you can use the XDocument class in the namespace System.Xml.Linq if you refer the assembly System.Xml.Linq.dll (since .NET 3.5). It has static methods like Load(Stream) and Parse(string) which you can use as above.

Upvotes: 4

Related Questions