mary
mary

Reputation: 267

Get URLs inside a HTML page with HTML Agility Pack

I have this code:

    foreach (HtmlNode node in hd.DocumentNode.SelectNodes("//div[@class='compTitle options-toggle']//a"))
    {
        string s=("node:" + node.GetAttributeValue("href", string.Empty));
    }

I want to get urls in tags like this:

<div class="compTitle options-toggle">

    <a class=" ac-algo fz-l ac-21th lh-24" href="http://www.bestbuy.com">
               <b>Huawei</b> Products - Best Buy
    </a>
</div>

I want to get "http://www.bestbuy.com" and "Huawei Products - Best Buy"

what should I do? Is my code correct?

Upvotes: 0

Views: 797

Answers (2)

Alkis Giamalis
Alkis Giamalis

Reputation: 320

The closing double quote should fix the selecting (it worked for me).

Get the plain text as

 string contentText = node.InnerText;

or having the Huawei word in bold, like this:

 string contentHtml = node.InnerHtml;

Upvotes: 1

frenk91
frenk91

Reputation: 929

this is an example of working code

        var document = new HtmlDocument();
        document.LoadHtml("<div class=\"compTitle options-toggle\"><a class=\" ac-algo fz-l ac-21th lh-24\" href=\"http://www.bestbuy.com\"><b>Huawei</b> Products - Best Buy</a></div>");

        var tags = document.DocumentNode.SelectNodes("//div[@class='compTitle options-toggle']//a").ToList();

        foreach (var tag in tags)
        {
            var link = tag.Attributes["href"].Value; // http://www.bestbuy.com
            var text = tag.InnerText; // Huawei Products - Best Buy
        }

Upvotes: 1

Related Questions