Reputation: 51
Any idea why this parsing is not working when accessing the XML directly from the site and it works when I copy and paste it into a var?
package main
import (
"encoding/xml"
"fmt"
"strings"
"io/ioutil"
"net/http"
)
type Sitemapindex struct {
Locations []Location `xml:"channel>item"`
}
type Location struct {
Loc string `xml:"title"`
}
func (e Location) String () string {
return fmt.Sprintf(e.Loc)
}
func main() {
resp, _ := http.Get("https://www.sec.gov/Archives/edgar/xbrlrss.all.xml")
bytes, _ := ioutil.ReadAll(resp.Body)
string_body := string(bytes)
var s Sitemapindex
decoder := xml.NewDecoder(strings.NewReader(string_body))
decoder.Strict = false
decoder.Decode(&s)
fmt.Println(s)
}
Upvotes: 5
Views: 218
Reputation: 6187
The content you're parsing is encoded as windows-1252
. To properly decode this data, the XML decoder needs to be parameterized by a charset reader that can read the specified charset.
import (
"encoding/xml"
"golang.org/x/net/html/charset"
)
decoder := xml.NewDecoder(reader)
decoder.CharsetReader = charset.NewReaderLabel
err := decoder.Decode(&s)
I guess that the error
returned at your attempt to decode the data tells something similar.
Upvotes: 4