Krash
Krash

Reputation: 2297

How to unmarshal xml data from online xml file

I have an xml file on suppose https://www.notre-shop.com/sitemap_products_1.xml and I want to unmarshal this xml in my go code so I did this

package main

import (
    "encoding/xml"
    "fmt"
    "io/ioutil"
    "log"
    "net/http"
)

var Product struct {
    Locs []string `xml:"url>loc"`
    Name []string `xml:"url>image:title"`
}

func main() {
    res, err := http.Get("https://www.notre-shop.com/sitemap_products_1.xml")
    if err!=nil{
        log.Fatal(err)
    }

    data, err := ioutil.ReadAll(res.Body)
    if err!=nil{
        log.Fatal(err)
    }
    defer res.Body.Close()

    err = xml.Unmarshal(data, &Product)
    if err!=nil{
        log.Fatal(err)
    }
    for x, _ := range Product.Name {
        fmt.Println(Product.Name[x], Product.Locs[x])
    }
}

But this doesn't print anything. What am I doing wrong?

Here is the complete code https://play.golang.org/p/pZ6j4-lSEz on play.

Upvotes: 3

Views: 2065

Answers (2)

Josh Lubawy
Josh Lubawy

Reputation: 396

Please try the following code which works for me (note: you could also use ioutil.ReadAll and xml.Unmarshal as you had before, instead of xml.Decode):

package main

import (
    "encoding/xml"
    "fmt"
    "log"
    "net/http"
)

// <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:image="http://www.google.com/schemas/sitemap-image/1.1">
//    <url>
//        <loc>
//            https://www.notre-shop.com/products/test-product-releasing-soon-2
//        </loc>
//        <lastmod>2017-01-17T08:04:44Z</lastmod>
//        <changefreq>daily</changefreq>
//        <image:image>
//            <image:loc>
//                https://cdn.shopify.com/s/files/1/0624/0605/products/NOTRE-CHICAGO-QK9C9548_fde37b05-495e-47b0-8dd1-b053c9ed3545.jpg?v=1481853712
//            </image:loc>
//            <image:title>Test Product Releasing Soon 2</image:title>
//        </image:image>
//    </url>
// </urlset>
type URLSet struct {
    XMLName string `xml:"urlset"`

    URLs []URL `xml:"url"`
}

type URL struct {
    Loc   string `xml:"loc"`
    Image Image  `xml:"image"`
}

type Image struct {
    Title string `xml:"title"`
}

func main() {
    resp, err := http.Get("https://www.notre-shop.com/sitemap_products_1.xml")
    if err != nil {
        log.Fatalln(err) // log.Fatal always exits the program, need to check err != nil first
    }
    defer resp.Body.Close()

    var urlSet URLSet
    if err = xml.NewDecoder(resp.Body).Decode(&urlSet); err != nil {
        log.Fatalln(err)
    }

    for _, url := range urlSet.URLs {
        fmt.Println(url.Loc, url.Image.Title)
    }
}

Upvotes: 3

Sarath Sadasivan Pillai
Sarath Sadasivan Pillai

Reputation: 7091

This is what the XML Specification says

 The Namespaces in XML Recommendation [XML Names] assigns a meaning 
 to names containing colon characters. Therefore, authors should not 
 use the colon in XML names except for namespace purposes, but XML 
 processors must accept the colon as a name character.

This is the XML Namespace Recommendation says

The XML Namespaces Recommendation expresses universal names in an indirect way that is compatible with XML 1.0. In effect the XML Namespaces Recommendation defines a mapping from an XML 1.0 tree where element type names and attribute names are local names into a tree where element type names and attribute names can be universal names. The mapping is based on the idea of a prefix. If an element type name or attribute name contains a colon, then the mapping treats the part of the name before the colon as a prefix, and the part of the name after the colon as the local name. A prefix foo refers to the URI specified in the value of the xmlns:foo attribute.

You may not use name with colon(:) to get the inner elements rather you can avoid the pefix ,here is your code re-written

And for performance/memory considerations since you are getting a io.Reader you may use xml.Decoder instead of xml.Unmarshal.

package main

import (
    "encoding/xml"
    "fmt"
    "log"
    "net/http"
)

var Product struct {
    Locs []string `xml:"url>loc"`
    Name []Image  `xml:"url>image"`
}

type Image struct {
    Title string `xml:"title"`
}

func main() {
    res, err := http.Get("https://www.notre-shop.com/sitemap_products_1.xml")
    if err != nil {
        log.Fatal(err)
    }
    defer res.Body.Close()
    decoder := xml.NewDecoder(res.Body)
    err = decoder.Decode(&Product)
    if err != nil {
        log.Fatal(err)
    }
    for x, _ := range Product.Name {
        fmt.Println(Product.Name[x].Title, Product.Locs[x])
    }
}

Here is play link play

Upvotes: 0

Related Questions