moerkb
moerkb

Reputation: 53

Unmarshal XML element with alternating content type in Go

I am trying to unmarshal a piece of xml like this:

<message numerus="yes">
    <source>%n part(s)</source>
    <translation>
        <numerusform>%n part</numerusform>
        <numerusform>%n parts</numerusform>
    </translation>
</message>

<message>
    <source>Foo</source>
    <translation>Bar</translation>
</message>

Please note that the <translation> tag can contain either a simple string or several <numerusform> tags.

Using the xml package of go, the struct I am unmarshalling into looks like this:

type Message struct {
    Source       string   `xml:"source"`
    Numerus      string   `xml:"numerus,attr"`
    Translation  string   `xml:"translation"`
    NumerusForms []string `xml:"translation>numerusform"`
}

The problem: either the field Translation or NumerusForms can be used. If using both like shown here, an error occurs:

Error on unmarshalling xml: main.Message field "Translation" with tag "translation" conflicts with field "NumerusForms" with tag "translation>numerusform"

Very reasonable, as the unmarshaller cannot decide how to process the <translation> tag.

Is there any way to handle this? It would be ok to have two distinct named fields (one for the plain string and one for the string list, as in the struct shown above).

For complete, runnable code, please refer to this go playground.

Side note: i am trying to parse a Qt Linguist TS file. The example is heavily stripped to make it easier to reason about.

Upvotes: 3

Views: 76

Answers (1)

icza
icza

Reputation: 417642

One easy solution that doesn't require custom unmarshaler logic to be implemented is to create a Translation struct with 2 fields: 1 for the optional text content, and one for the optional <numerusform> child elements:

type Message struct {
    Source      string      `xml:"source"`
    Numerus     string      `xml:"numerus,attr"`
    Translation Translation `xml:"translation"`
}

type Translation struct {
    Content      string   `xml:",cdata"`
    NumerusForms []string `xml:"numerusform"`
}

This will output (try it on the Go Playground):

Source: %n part(s)
Numerus: yes
Translation: "\n\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t"
NumerusForms: [%n part %n parts]
  Numerus: %n part
  Numerus: %n parts

Source: Foo
Numerus: 
Translation: "Bar"
NumerusForms: []

Note that when there are actually <numerusform> child elements, the Translation.Content field still gets populated with the indentation characters which you can safely ignore.

Upvotes: 3

Related Questions