Reputation: 99
I am trying to make a simple web scraper in go and I can't seem to get the most simple functionality from colly. I took the basic example from the colly docs and while it worked with the hackernews.org site they used it isn't working with the site I am trying to scrape. I tried several iterations of the url ie with https://, www. , with / at the end etc and nothing seems to work. I tried scraping the same site with beatiful soup in python and got everything so i know the site can be scraped. Any help is appreciated. Thanks.
package main
import (
"fmt"
"github.com/gocolly/colly"
)
// main function
func main() {
/* instatiate colly */
c := colly.NewCollector(
colly.AllowedDomains("www.bjjheroes.com/"),
)
// On every a element which has href attribute call callback
c.OnHTML("a[href]", func(e *colly.HTMLElement) {
fmt.Printf("Link found: %q \n", e.Text)
})
c.Visit("www.bjjheroes.com/a-z-bjj-fighters-list")
}
Upvotes: 4
Views: 2283
Reputation: 99
colly.AllowedDomains(
"www.bjjheroes.com/",
"bjjheroes.com/",
"https://bjjheroes.com/",
"www.bjjheroes.com",
"bjjheroes.com",
"https://bjjheroes.com",
),
everything worked
Upvotes: 3