Reputation: 45761
I am deserializing the following XML file. Using XML serializer with VSTS 2008 + C# + .Net 3.5.
Here is the XML file.
<?xml version="1.0" encoding="utf-8"?>
<Person><Name>=b?olu</Name></Person>
Here is the screen snapshot for the display of the XML file and binary format of the XML file,
If there are some solutions to accept such characters, it will be great! Since my XML file is big, and if such characters are really invalid and should be filtered, I want to keep remaining content of XML file after deserialization.
Currently XML deserialization fails with InvalidOperationException and the whole XML file information will be lost.
Actually, when open this XML file in VSTS, there is error like this, Error 1 Character '?', hexadecimal value 0xffff is illegal in XML documents. I am confused since in the binary form, there is no 0xffff values.
Any solutions or ideas?
EDIT1: here is my code which is used to deserialize XML file,
static void Foo()
{
XmlSerializer s = new XmlSerializer(typeof(Person));
StreamReader file = new StreamReader("bug.xml");
s.Deserialize(file);
}
public class Person
{
public string Name;
}
Upvotes: 2
Views: 5057
Reputation: 161773
The "invalid characters" look like they might be intended to be encoded Unicode characters. Perhaps they wrong encoding is being used?
Can you ask the originators of this document what character they meant to include at that location? Perhaps ask them how they generated the document?
Upvotes: 0
Reputation: 3154
Have you tried the DataContractSerializer instead? I've encountered an interesting situation, when someone copy and pasted some word or excel stuff into my web application: the string contained some invalid control characters (such as vertical tab). To my surprise this was serialized when sending it to a WCF service and even read back 100% original when requesting it. The pure .net environment did not have a problem with this, so I assume that the DataContractSerializer can handle such stuff (which is IMHO a violation of XML spec, however).
We had another Java client accessing the same service - it failed when receiving this record...
[Edit after ugly format in my comment below]
Try this:
DataContractSerializer serializer = new DataContractSerializer(typeof(MyType));
using (XmlWriter xmlWriter = new XmlTextWriter(filePath, Encoding.UTF8))
{
serializer.WriteObject(xmlWriter, instanceOfMyType);
}
using (XmlReader xmlReader = new XmlTextReader(filePath))
{
MyType = serializer.ReadObject(xmlReader) as MyType;
}
The comment of the second Marc is about DataContractSerializers habit to make XmlElements instead of XmlAttributes:
<AnElement>value</AnElement>
instead of
<AnElement AnAttribute="value" />
Upvotes: 1
Reputation: 5342
Does this style help?
<name>
<![CDATA[
=b?olu
]]>
</name>
Either that or encoding should do the trick.
EDIT: Found this page: http://www.eggheadcafe.com/articles/system.xml.xmlserialization.asp. Specifically, this code for deserialization:
public Object DeserializeObject(String pXmlizedString)
{
XmlSerializer xs = new XmlSerializer(typeof(Automobile));
MemoryStream memoryStream = new MemoryStream(StringToUTF8ByteArray(pXmlizedString));
XmlTextWriter xmlTextWriter = new XmlTextWriter(memoryStream, Encoding.UTF8);
return xs.Deserialize(memoryStream);
}
That part about "StringToUTF8ByteArray" and "Encoding.UTF8" look strangely absent from yours. I'm guessing .NET doesn't like reading the encoding of your actual XML file...?
Upvotes: 1