Reputation: 7176
... specifically xA3 (£, £, £)
I'm loading several long XML documents and periodically, I'll run into one that won't load, throwing the exception:
Invalid character in the given encoding. Line x, position y.
Here's the code in question:
var doc = new XmlDocument();
doc.Load(file.FullName);
When I look at the document in question at the line indicated, I'll see the xA3 formatted inversely (black bg, white fg) within one of the XML tags.
The header of each XML file is nothing remarkable:
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
This may sound like a really dumb question, but is there a way to either remove the offending character or tell the XMLDocument that reads the file to accept the character coding?
Upvotes: 0
Views: 602
Reputation: 547
This answer is based on the assumption that your XML file does not contain the character entity £
but the byte value 0xa3
.
The UTF-8 code for the pound sign is the two byte code 0xc2 0xa3
. If there is no byte 0xc2
before 0xa3
the encoding of your XML file is not UTF-8, and the header information is wrong.
If this is the case you can either change the encoding in the XML header to ISO 8859-1 (where the pound sign can be found at code point 0xa3
), or try to figure out why your XML files are not UTF-8 encoded and fix them. As I don't know if your files contain any characters that do not exist in ISO 8859-1 I would prefer the second option.
Upvotes: 2