Krash
Krash

Reputation: 2287

How to scrape what is heading of a h1 tag using golang?

Suppose this is a h1 tag

<h1>FindMe</h1>

in a huge webpage with many other h1 tags, but this is the first h1 tag. So I am using the net/html package and I am searching for the first StartTagToken, after my program has found the token, how do I get what is written inside the heading i.e. FindMe in this case.

This is the code I have right now

z := html.NewTokenizer(body)    

for{
    tt := z.Next()

    if tt= html.ErrorToken{
        return
    }
    else if tt== html.StartTagToken{
        tag := z.Token()

        if tag.Data =="h1"{
            fmt.Println("We found the title\n")
            //some code to find what is stored in the heading
        }
    }
} 

How do I go about doing that?

EDIT: More specifically, what is the property of variable tag which would give me the text inside of it. I may be wrong with the conceptual terms here. Please bear with me

Upvotes: 1

Views: 698

Answers (1)

ain
ain

Reputation: 22759

What you got is the StartTagToken, the part you're intrested in is between it and the corresponding EndTagToken as TextToken. So you need to read the next token and it's Data should be the value you're after, something like

...
if tag.Data =="h1"{
   if tt = z.Next(); tt == html.TextToken {
       fmt.Println(z.Token().Data)
   }
}

Upvotes: 1

Related Questions