InfoStatus
InfoStatus

Reputation: 7113

XML Serialization of an Object Containing invalid chars

I'm serializing an object that contains HTML data in a String Property.

Dim Formatter As New Xml.Serialization.XmlSerializer(GetType(MyObject))
Dim fs As New FileStream(FilePath, FileMode.Create)
Formatter.Serialize(fs, Ob)
fs.Close()

But when I'm reading the XML back to the Object:

Dim Formatter As New Xml.Serialization.XmlSerializer(GetType(MyObject))
Dim fs As New FileStream(FilePath, FileMode.Open)
Dim Ob = CType(Formatter.Deserialize(fs), MyObject)
fs.Close()

I get this error:

"'', hexadecimal value 0x14, is an invalid character. Line 395, position 22."

Shouldn't .NET prevent this kind of error, escaping the invalid characters?

What's happening here and how can I fix it?

Upvotes: 5

Views: 17432

Answers (4)

John Saunders
John Saunders

Reputation: 161773

You should really post the code of the class you're trying to serialize and deserialize. In the meantime, I'll make a guess.

Most likely, the invalid character is in a field or property of type string. You will need to serialize that as an array of bytes, assuming you can't avoid having that character present at all:

[XmlRoot("root")]
public class HasBase64Content
{
    internal HasBase64Content()
    {
    }

    [XmlIgnore]
    public string Content { get; set; }

    [XmlElement]
    public byte[] Base64Content
    {
        get
        {
            return System.Text.Encoding.UTF8.GetBytes(Content);
        }
        set
        {
            if (value == null)
            {
                Content = null;
                return;
            }

            Content = System.Text.Encoding.UTF8.GetString(value);
        }
    }
}

This produces XML like the following:

<?xml version="1.0" encoding="utf-8"?>
<root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
      xmlns:xsd="http://www.w3.org/2001/XMLSchema">
    <Base64Content>AAECAwQFFA==</Base64Content>
</root>

I see you'd probably prefer VB.NET:

''# Prettify doesn't like attributes as the first item in a VB code block, so this comment is here so that it looks right on StackOverflow.
<XmlRoot("root")> _
Public Class HasBase64Content

    Private _content As String
    <XmlIgnore()> _
    Public Property Content() As String
        Get
            Return _content
        End Get
        Set(ByVal value As String)
            _content = value
        End Set
    End Property

    <XmlElement()> _
    Public Property Base64Content() As Byte()
        Get
            Return System.Text.Encoding.UTF8.GetBytes(Content)
        End Get
        Set(ByVal value As Byte())
            If Value Is Nothing Then
                Content = Nothing
                Return
            End If
            Content = System.Text.Encoding.UTF8.GetString(Value)
        End Set
    End Property
End Class

Upvotes: 1

Brandon Kuehl
Brandon Kuehl

Reputation: 71

I set the XmlReaderSettings property CheckCharacters to false. I would only advise doing this if you have serialized the data yourself via XmlSerializer. If it's from an unknown source then it's not really a good idea.

public static T Deserialize<T>(string xml)
{
    var xmlReaderSettings = new XmlReaderSettings() { CheckCharacters = false };

    XmlReader xmlReader = XmlTextReader.Create(new StringReader(xml), xmlReaderSettings);
    XmlSerializer xs = new XmlSerializer(typeof(T));

    return (T)xs.Deserialize(xmlReader);
}

Upvotes: 6

lavinio
lavinio

Reputation: 24299

It should really have failed in the serialize step, because 0x14 is an invalid value for XML. There is no way to escape it, not even with &#x14, since it is excluded as a valid character from the XML model. I am actually surprised that the serializer lets this through, as it makes the serializer a non-conforming one.

Is it possible for you to remove the invalid characters from the string before serializing it? For what purpose do you have an 0x14 in HTML?

Or, is it possible you are writing with one encoding, and reading with a different one?

Upvotes: 2

Piotr Owsiak
Piotr Owsiak

Reputation: 6249

I would exepct .NET to handle this, but you can also have look at XmlSerializer class and XmlReaderSettings (see sample generic method below):

public static T Deserialize<T>(string xml)
{
    var xmlReaderSettings = new XmlReaderSettings()
                                {
                                    ConformanceLevel = ConformanceLevel.Fragment,
                                    ValidationType = ValidationType.None
                                };

    XmlReader xmlReader = XmlTextReader.Create(new StringReader(xml), xmlReaderSettings);
    XmlSerializer xs = new XmlSerializer(typeof(T), "");

    return (T)xs.Deserialize(xmlReader);
}

I would also check if there are no encoding (Unicode, UTF8, etc.) issues in your code. Hexadecimal value 0x14 is not a char you would expect in XML :)

Upvotes: 0

Related Questions