Rumpelstinsk
Rumpelstinsk

Reputation: 3241

.net XML deserialization: uppercase and lowercase exception

I'm having some problems to deserialize a XML in .net. This is the error I'm getting:

The opening tag 'A' on line 72 position 56 does not match the end tag of 'a'. Line 72, position 118.

As you can see, is the same tag, but one is uppercase and the other is lower case. My XML has this structure:

<?xml version="1.0"?>
<translationfile xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" _
                 xmlns:xsd="http://www.w3.org/2001/XMLSchema">
    <translationtext>
       <es_text>Spanish text</es_text>
       <en_text>English text</en_text>
       <developer_comment>Plain text</developer_comment>
    </translationtext>
    ....
</translationfile>

And this is my vb class

Option Strict Off
Option Explicit On

Imports System.Xml.Serialization

'
'Este código fuente fue generado automáticamente por xsd, Versión=2.0.50727.3038.
'

'''<comentarios/>
<System.CodeDom.Compiler.GeneratedCodeAttribute("xsd", "2.0.50727.3038"), _
 System.SerializableAttribute(), _
 System.Diagnostics.DebuggerStepThroughAttribute(), _
 System.ComponentModel.DesignerCategoryAttribute("code"), _
 System.Xml.Serialization.XmlTypeAttribute(AnonymousType:=True), _
 System.Xml.Serialization.XmlRootAttribute([Namespace]:="", IsNullable:=False)> _
Partial Public Class translationfile

    Private itemsField As List(Of translationfileTranslationtext)

    '''<comentarios/>
    <System.Xml.Serialization.XmlElementAttribute("translationtext", _
        Form:=System.Xml.Schema.XmlSchemaForm.Unqualified)> _
    Public Property Items As List(Of translationfileTranslationtext)
        Get
            Return Me.itemsField
        End Get
        Set(value As List(Of translationfileTranslationtext))
            Me.itemsField = value
        End Set
    End Property
End Class

'''<comentarios/>
<System.CodeDom.Compiler.GeneratedCodeAttribute("xsd", "2.0.50727.3038"), _
 System.SerializableAttribute(), _
 System.Diagnostics.DebuggerStepThroughAttribute(), _
 System.ComponentModel.DesignerCategoryAttribute("code"), _
 System.Xml.Serialization.XmlTypeAttribute(AnonymousType:=True)> _
Partial Public Class translationfileTranslationtext

    Private es_textField As String

    Private en_textField As String

    Private developer_commentField As String

    '''<comentarios/>
    <System.Xml.Serialization.XmlElementAttribute _
        (Form:=System.Xml.Schema.XmlSchemaForm.Unqualified)> _
    Public Property es_text() As String
        Get
            Return Me.es_textField
        End Get
        Set(value As String)
            Me.es_textField = value
        End Set
    End Property

    '''<comentarios/>
    <System.Xml.Serialization.XmlElementAttribute( _
        Form:=System.Xml.Schema.XmlSchemaForm.Unqualified)> _
    Public Property en_text() As String
        Get
            Return Me.en_textField
        End Get
        Set(value As String)
            Me.en_textField = value
        End Set
    End Property

    '''<comentarios/>
    <System.Xml.Serialization.XmlElementAttribute( _
        Form:=System.Xml.Schema.XmlSchemaForm.Unqualified)> _
    Public Property developer_comment() As String
        Get
            Return Me.developer_commentField
        End Get
        Set(value As String)
            Me.developer_commentField = value
        End Set
    End Property
End Class

The problem is that both text could contain HTML code. The XML is generated manually by the clients and I cannot change the text inside these tags. Also they could define their owns tags like <client27tagname>...</client27tagname>. For example. This is a real case:

<translationtext>
    <es_text><p>Nombre</P></es_text>
    <en_text><p>Name</P></en_text>
    <developer_comment>irrelevant text</developer_comment>
</translationtext>

When I try to deserialize a XML file, I'm getting the previous error because <p> is lower case and </P> is upper case. How can I desarialize it correctly without changing the text? Is there any possibility to treat all the text inside these tags as simple string?

This is the code I'm using for deserialize:

Dim stream As New IO.StreamReader(path)
Dim ser As New Xml.Serialization.XmlSerializer(GetType(translationfile))
Dim myperfil As New translationfile

myperfil = CType(ser.Deserialize(stream), translationfile) 'This line throws the exception
stream.Close()

UPDATE

After doing the change suggested by Olivier. This is my class:

Option Strict Off
Option Explicit On

Imports System.Xml.Serialization

<System.CodeDom.Compiler.GeneratedCodeAttribute("xsd", "2.0.50727.3038"), _
 System.SerializableAttribute(), _
 System.Diagnostics.DebuggerStepThroughAttribute(), _
 System.ComponentModel.DesignerCategoryAttribute("code"), _
 System.Xml.Serialization.XmlTypeAttribute(AnonymousType:=True), _
 System.Xml.Serialization.XmlRootAttribute([Namespace]:="", IsNullable:=False)> _
Partial Public Class translationfile

    Private itemsField As List(Of translationfileTranslationtext)

    <System.Xml.Serialization.XmlElementAttribute("translationtext", Form:=System.Xml.Schema.XmlSchemaForm.Unqualified)> _
    Public Property Items As List(Of translationfileTranslationtext)
        Get
            Return Me.itemsField
        End Get
        Set(value As List(Of translationfileTranslationtext))
            Me.itemsField = value
        End Set
    End Property
End Class

<System.CodeDom.Compiler.GeneratedCodeAttribute("xsd", "2.0.50727.3038"), _
 System.SerializableAttribute(), _
 System.Diagnostics.DebuggerStepThroughAttribute(), _
 System.ComponentModel.DesignerCategoryAttribute("code"), _
 System.Xml.Serialization.XmlTypeAttribute(AnonymousType:=True)> _
Partial Public Class translationfileTranslationtext

    Private es_textField As String

    Private en_textField As String

    Private developer_commentField As String

    <XmlIgnore()>
    Public Property es_text() As String
        Get
            Return Me.es_textField
        End Get
        Set(value As String)
            Me.es_textField = value
        End Set
    End Property

    <XmlElement(ElementName:="es_text", Form:=System.Xml.Schema.XmlSchemaForm.Unqualified)> _
    Public Property es_HtmlText() As String
        Get
            Return System.Web.HttpUtility.HtmlEncode(Me.es_textField)
        End Get
        Set(value As String)
            Me.es_textField = HttpUtility.HtmlDecode(value)
        End Set
    End Property

    <XmlIgnore()>
    Public Property en_text() As String
        Get
            Return Me.en_textField
        End Get
        Set(value As String)
            Me.en_textField = value
        End Set
    End Property

    <XmlElement(ElementName:="en_text", Form:=System.Xml.Schema.XmlSchemaForm.Unqualified)> _
    Public Property en_HtmlText() As String
        Get
            Return System.Web.HttpUtility.HtmlEncode(Me.en_textField)
        End Get
        Set(value As String)
            Me.en_textField = HttpUtility.HtmlDecode(value)
        End Set
    End Property
       <System.Xml.Serialization.XmlElementAttribute(Form:=System.Xml.Schema.XmlSchemaForm.Unqualified)> _
    Public Property developer_comment() As String
        Get
            Return Me.developer_commentField
        End Get
        Set(value As String)
            Me.developer_commentField = value
        End Set
    End Property
End Class

Upvotes: 0

Views: 2251

Answers (2)

Olivier Jacot-Descombes
Olivier Jacot-Descombes

Reputation: 112537

Use HttpUtility.HtmlEncode to encode your text and HttpUtility.HtmlDecode to decode it.

You could create an additional property for this and exclude the original property from serialization.

'Exclude the original property from serialization
<XmlIgnore()> _
Public Property en_text() As String
    Get
        Return Me.en_textField
    End Get
    Set(value As String)
        Me.en_textField = value
    End Set
End Property

'Name the encoding/decoding property element like the original property
<XmlElement(ElementName := "en_text", Form:=XmlSchemaForm.Unqualified)> _
Public Property en_HtmlEncodedText() As String
    Get
        Return HttpUtility.HtmlEncode(Me.en_textField)
    End Get
    Set(value As String)
        Me.en_textField = HttpUtility.HtmlDecode(value)
    End Set
End Property

Html encoding will translate the "<" and ">" into "&lt;" and "&gt;" and thus make the inner tags invisible to XML.


UPDATE

Mt solution works. I have tested it now. You have probably tested it with an XML still containing the html tags in plain text ("<p>Name</P>"). What my code does is to write the html as "&amp;lt;p&amp;gt;Name&amp;lt;/P&amp;gt;". This is what HttpUtility.HtmlEncode does. Therefore you must start by writing an XML file using my method. Only then, reading will succeed.

Here is my write test:

Public Sub WriteTest()
    Dim myperfil As New translationfile With {
        .Items = New List(Of translationfileTranslationtext) From {
            New translationfileTranslationtext With {.en_text = "en test", .es_text = "spanish"},
            New translationfileTranslationtext With {.en_text = "<p>Name</P>", .es_text = "<p>Nombre</P>"}
        }
    }

    Dim writer As New IO.StreamWriter(path)
    Dim ser As New XmlSerializer(GetType(translationfile))
    ser.Serialize(writer, myperfil)
    writer.Close()
End Sub

It creates the following XML:

?xml version="1.0" encoding="utf-8"?>
<translationfile xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
  <translationtext>
    <es_text>spanish</es_text>
    <en_text>en test</en_text>
  </translationtext>
  <translationtext>
    <es_text>&amp;lt;p&amp;gt;Nombre&amp;lt;/P&amp;gt;</es_text>
    <en_text>&amp;lt;p&amp;gt;Name&amp;lt;/P&amp;gt;</en_text>
  </translationtext>
</translationfile>

And here is my read test, which throws no exception:

Public Sub ReadTest()
    Dim myperfil As translationfile
    Dim reader As New IO.StreamReader(path)
    Dim ser As New XmlSerializer(GetType(translationfile))

    myperfil = CType(ser.Deserialize(reader), translationfile)
    reader.Close()

    For Each item As translationfileTranslationtext In myperfil.Items
        Console.WriteLine("EN = {0}, ES = {1}", item.en_text, item.es_text)
    Next
    Console.ReadKey()
End Sub

It write this to the console:

EN = en test, ES = spanish
EN = <p>Name</P>, ES = <p>Nombre</P>

Upvotes: 1

Rumpelstinsk
Rumpelstinsk

Reputation: 3241

After some test i found a workaround.

  1. I get all the text as a simple string
  2. I replace all the < characters to a default string: #open_key#
  3. I replace all the #open_key#es_text> to <es_text>
  4. Same for en_text, developer_coment, etc...
  5. I save the result to a temporary file
  6. I deserialize the temporary file
  7. Before doing the response, I replace all the #open_key#to <

Upvotes: 0

Related Questions