Reputation: 547
I have a table in html that I would like to parse. Something like the one in the following
http://sprunge.us/IJUC
However, I'm not sure of a good way to parse out the information. I've seen a couple of html parsers, but those seem to require that everything has a special tag for you to parse it like info to grab; however, the majority of my info is within <td></td>
Does anyone have a suggestion for parsing this information out?
Upvotes: 8
Views: 9668
Reputation: 2563
You may also be interested in Go's experimental HTML parser: https://code.google.com/p/go.net/html
The package definition according to the godoc:
Package html implements an HTML5-compliant tokenizer and parser
I haven't used it myself, but it seems pretty straight-forward:
Parsing is done by calling Parse with an io.Reader, which returns the root of the parse tree (the document element) as a *Node. It is the caller's responsibility to ensure that the Reader provides UTF-8 encoded HTML.
go get code.google.com/p/go.net/html
import "code.google.com/p/go.net/html"
doc, err := html.Parse(r)
It is not part of any current release, but can be used if you install from source, or use the golang-tip ubuntu apt repo.
EDIT: you can also use the following mirror of the experimental Go packages here: https://github.com/kless/go-exp
go get github.com/kless/go-exp/html
import (
"github.com/kless/go-exp/html"
)
Upvotes: 2
Reputation: 24003
Shameless plug: My goquery library. It's the jQuery syntax brought to Go (requires Go's experimental html package, see instructions in the README of the library).
So you can do things like that (assuming your HTML document is loaded in doc, a *goquery.Document
):
doc.Find("td").Each(func (i int, s *goquery.Selection) {
fmt.Printf("Content of cell %d: %s\n", i, s.Text())
})
Edit: Change doc.Root.Find
to doc.Find
in the example since a goquery Document is now a Selection too (new in v0.2/master branch)
Upvotes: 16
Reputation: 64
If your HTML is well-formed, you can use the built-in XML parser:
http://golang.org/pkg/encoding/xml/
Upvotes: -1