Guillermo Gomez
Guillermo Gomez

Reputation: 1785

System.Xml.XmlException: Invalid character in the given encoding

I am using XmlDocument.Load to load the contents of an XML file that has some characters in Thai. The application is erroring out with the following exception.

System.Xml.XmlException: Invalid character in the given encoding. Line 2, position 82. at System.Xml.XmlTextReaderImpl.Throw(Exception e) at System.Xml.XmlTextReaderImpl.InvalidCharRecovery(Int32& bytesCount, Int32& charsCount) at System.Xml.XmlTextReaderImpl.GetChars(Int32 maxCharsCount) at System.Xml.XmlTextReaderImpl.ReadData() at System.Xml.XmlTextReaderImpl.ParseText(Int32& startPos, Int32& endPos, Int32& outOrChars) at System.Xml.XmlTextReaderImpl.FinishPartialValue() at System.Xml.XmlTextReaderImpl.get_Value() at System.Xml.XmlLoader.LoadNode(Boolean skipOverWhitespace) at System.Xml.XmlLoader.LoadDocSequence(XmlDocument parentDoc) at System.Xml.XmlDocument.Load(XmlReader reader)

The XML file begins with this content enter image description here

Notice the strange character before the closing tag. This content is coming from a third-party and I don't have access to the file/content.

My questions are:

  1. Why is the strange character appearing in the content sent to my from the third party provider?
  2. Is there any way to successfully process the file (load it into the XmlDocument) since I don't have access to modifying its content before processing it?

Upvotes: 2

Views: 4701

Answers (2)

Pavan Chandaka
Pavan Chandaka

Reputation: 12811

If you are very sure that they are Thai characters, Then try correct data encoding in Load.

For Thai the Character encoding is - ISO 8859-11

So could you please try below way of doc load:

 xmlDoc.Load(new StreamReader(File.Open("YourXMLFile.xml"), 
                         Encoding.GetEncoding("iso-8859-11"))); 

Answer to first question, you may need to talk to the third party and ask them to look into their source code to find out why those unwanted characters are appearing in the generated XML.

Upvotes: 2

Mick
Mick

Reputation: 6864

The data supplied by the third party is not valid XML. I think there's only two solutions i.e. Get the third party to supply valid XML or strip the invalid characters from the XML and process what you can. You could do this...

string invalidXML = File.ReadAllText(path);
var validXml = invalidXML.Where(ch => XmlConvert.IsXmlChar(ch)).ToArray()
if (validXml != invalidXML)
   // log the invalid

// process (what you can in) the validXml 

Upvotes: 0

Related Questions