user2659590
user2659590

Reputation:

Unmarshal with global namespace

I have the following XML:

<rss version="2.0">
  <channel>
    ...
    <item>
      <link>http://stackoverflow.com</link>
      <atom:link xmlns:atom="http://www.w3.org/2005/Atom" href="http://stackoverflow.com"/>
      ...
    </item>
  </channel>
</rss>

I want to extract the link attribute, I have the following struct:

type Item struct {
  Link string `xml:"http://www.w3.org/2005/Atom link"`
}

I know, that I need a prefix to get the Link, but because there is no namespace given (in form of an xmls-Attribute, but I don't know, how.

I could, of course, save all :*link-Attributes to a slice, but I'm sure there is a better solution.

Thanks in advance!

Upvotes: 3

Views: 386

Answers (1)

James Henstridge
James Henstridge

Reputation: 43949

The namespace handling in the standard library encoding/xml package seems to be a big ad-hoc, and having elements in different namespaces with the same name seems to be a trigger.

Ideally you'd be able to decode the given XML into the following structures:

type Rss struct {
    Items []Item `xml:"channel>item"`
}

type Item struct {
    Link     string   `xml:"link"`
    AtomLink AtomLink `xml:"http://www.w3.org/2005/Atom link"`
}

type AtomLink struct {
    Href string `xml:"href,attr"`
}

But this results in the error main.Item field "Link" with tag "link" conflicts with field "AtomLink" with tag "http://www.w3.org/2005/Atom link" (as seen in http://play.golang.org/p/LgW-vm4euL).

However, if we decide that we want to ignore the <atom:link> element by commenting out the Item.AtomLink field, we end up decoding an empty string, since xml:"link" matches <link> elements in any namespace rather than just the blank namespace. The final <atom:link> element is empty, so doesn't return anything.

A couple of possible work arounds include:

  1. Only try to decode the <atom:link> element, since it can be selected uniquely. This may not be useful if you're also processing RSS feeds without Atom namespace elements.

  2. Collect the contents of all <link> elements by modifying the Item struct to use:

    Links []string `xml:"link"`
    

    And then discard any empty strings in the slice.

At the end of the day, the package will need some way to refer to the blank namespace. That may require new syntax in order to keep existing programs functioning though.

Upvotes: 1

Related Questions