surfearth
surfearth

Reputation: 3147

Unmarshal multiple XML items

I am trying to unmarshal multiple items contained in nodes with an identical structure for further processing, but don't seem to be able to access the data and I am not sure why. The XML data is structured in the following form (I am trying to access all of the Item's:

<?xml version="1.0" encoding="ISO-8859-1" ?> 
<datainfo>
  <origin>NOAA/NOS/CO-OPS</origin>
  <producttype> Annual Tide Prediction </producttype>
  <IntervalType>High/Low Tide Predictions</IntervalType>
  <data>
    <item>
      <date>2015/12/31</date>
      <day>Thu</day>
      <time>03:21 AM</time>
      <predictions_in_ft>5.3</predictions_in_ft>
      <predictions_in_cm>162</predictions_in_cm>
      <highlow>H</highlow>
    </item>
    <item>
      <date>2015/12/31</date>
      <day>Thu</day>
      <time>09:24 AM</time>
      <predictions_in_ft>2.4</predictions_in_ft>
      <predictions_in_cm>73</predictions_in_cm>
      <highlow>L</highlow>
    </item>
  </data>
</datainfo>

My code is:

package main

import (
    "encoding/xml"
    "fmt"
    "io/ioutil"
    "os"
)

// TideData stores a series of tide predictions
type TideData struct {
    Tides []Tide `xml:"data>item"`
}

// Tide stores a single tide prediction
type Tide struct {
    Date         string  `xml:"date"`
    Day          string  `xml:"day"`
    Time         string  `xml:"time"`
    PredictionFt float64 `xml:"predictions_in_ft"`
    PredictionCm float64 `xml:"predictions_in_cm"`
    HighLow      string  `xml:"highlow"`
}

func (t Tide) String() string {
    return t.Date + " " + t.Day + " " + t.Time + " " + t.HighLow
}

func main() {
    xmlFile, err := os.Open("9414275 Annual.xml")
    if err != nil {
        fmt.Println("Error opening file:", err)
        return
    }
    defer xmlFile.Close()

    b, _ := ioutil.ReadAll(xmlFile)

    var tides TideData
    xml.Unmarshal(b, &tides)

    fmt.Println(tides)
    for _, datum := range tides.Tides {
        fmt.Printf("\t%s\n", datum)
    }
}

When run the output is empty, which leads me to think that the data is not unmarshalled. Output is:

{[]}

Upvotes: 1

Views: 989

Answers (1)

James Henstridge
James Henstridge

Reputation: 43899

You are ignoring the error return from xml.Unmarshal. By slightly modifying your program, we can see what is going on:

xml: encoding "ISO-8859-1" declared but Decoder.CharsetReader is nil

And poking around in the documentation, we find that by default the package only supports XML encoded in UTF-8:

    // CharsetReader, if non-nil, defines a function to generate
    // charset-conversion readers, converting from the provided
    // non-UTF-8 charset into UTF-8. If CharsetReader is nil or
    // returns an error, parsing stops with an error. One of the
    // the CharsetReader's result values must be non-nil.
    CharsetReader func(charset string, input io.Reader) (io.Reader, error)

So it seems you need to provide your own character set conversion routine. You can inject it by modifying your code something like this:

decoder := xml.NewDecoder(xmlFile)
decoder.CharsetReader = makeCharsetReader
err := decoder.Decode(&tides)

(note that we're now decoding from an io.Reader rather than a byte array now, so the ReadAll logic can be removed). The golang.org/x/text/encoding family of packages might help you in implementing your makeCharsetReader function. Something like this might work:

import "golang.org/x/text/encoding/charmap"

func makeCharsetReader(charset string, input io.Reader) (io.Reader, error) {
    if charset == "ISO-8859-1" {
        // Windows-1252 is a superset of ISO-8859-1, so should do here
        return charmap.Windows1252.NewDecoder().Reader(input), nil
    }
    return nil, fmt.Errorf("Unknown charset: %s", charset)
}

You should then be able to decode the XML.

Upvotes: 6

Related Questions