Blaszard
Blaszard

Reputation: 32005

Equivalent to Python's HTML parsing function/module in Go?

I'm now learning Go myself and am stuck in getting and parsing HTML/XML. In Python, I usually write the following code when I do web scraping:

from urllib.request import urlopen, Request
url = "http://stackoverflow.com/"
req = Request(url)
html = urlopen(req).read()

, then I can get raw HTML/XML in a form of either string or bytes and proceed to work with it. In Go, how can I cope with it? What I hope to get is raw HTML data which is stored either in string or []byte (though it can be easily converted, that I don't mind which to get at all). I consider using gokogiri package to do web scraping in Go (not sure I'll indeed end up with using it!), but it looks like it requires raw HTML text before doing any work with it...

So how can I acquire such object?

Or is there any better way to do web scraping work in Go?

Thanks.

Upvotes: 0

Views: 119

Answers (1)

Intermernet
Intermernet

Reputation: 19418

From the Go http.Get Example:

package main

import (
    "fmt"
    "io/ioutil"
    "log"
    "net/http"
)

func main() {
    res, err := http.Get("http://www.google.com/robots.txt")
    if err != nil {
        log.Fatal(err)
    }
    robots, err := ioutil.ReadAll(res.Body)
    res.Body.Close()
    if err != nil {
        log.Fatal(err)
    }
    fmt.Printf("%s", robots)
}

Will return the contents of http://www.google.com/robots.txt into the string variable robots.

For XML parsing look into the Go encoding/xml package.

Upvotes: 2

Related Questions