Reputation: 32005
I'm now learning Go myself and am stuck in getting and parsing HTML/XML. In Python, I usually write the following code when I do web scraping:
from urllib.request import urlopen, Request
url = "http://stackoverflow.com/"
req = Request(url)
html = urlopen(req).read()
, then I can get raw HTML/XML in a form of either string
or bytes
and proceed to work with it. In Go, how can I cope with it? What I hope to get is raw HTML data which is stored either in string
or []byte
(though it can be easily converted, that I don't mind which to get at all). I consider using gokogiri package to do web scraping in Go (not sure I'll indeed end up with using it!), but it looks like it requires raw HTML text before doing any work with it...
So how can I acquire such object?
Or is there any better way to do web scraping work in Go?
Thanks.
Upvotes: 0
Views: 119
Reputation: 19418
From the Go http.Get
Example:
package main
import (
"fmt"
"io/ioutil"
"log"
"net/http"
)
func main() {
res, err := http.Get("http://www.google.com/robots.txt")
if err != nil {
log.Fatal(err)
}
robots, err := ioutil.ReadAll(res.Body)
res.Body.Close()
if err != nil {
log.Fatal(err)
}
fmt.Printf("%s", robots)
}
Will return the contents of http://www.google.com/robots.txt
into the string variable robots
.
For XML parsing look into the Go encoding/xml
package.
Upvotes: 2