Gihan Lasita
Gihan Lasita

Reputation: 3055

How to get a link's title and href value separately with html agility pack?

Im trying to download a page contain a table like this

<table id="content-table">
  <tbody>
    <tr>
      <th id="name">Name</th>
      <th id="link">link</th>
    </tr>

    <tr class="tt_row">

      <td class="ttr_name">
       <a title="name_of_the_movie" href="#"><b>name_of_the_movie</b></a>
       <br>
       <span class="pre">message</span>
      </td>

      <td class="td_dl">
        <a href="download_link"><img alt="Download" src="#"></a>
      </td>

    </tr>

    <tr class="tt_row"> .... </tr>
    <tr class="tt_row"> .... </tr>
  </tbody>
</table>

i want to extract the name_of_the_movie from td class="ttr_name" and download link from td class="td_dl"

this is the code i used to loop through table rows

HtmlAgilityPack.HtmlDocument hDocument = new HtmlAgilityPack.HtmlDocument();
hDocument.LoadHtml(htmlSource);
HtmlNode table = hDocument.DocumentNode.SelectSingleNode("//table");

foreach (var row in table.SelectNodes("//tr"))
{
  HtmlNode nameNode = row.SelectSingleNode("td[0]");
  HtmlNode linkNode = row.SelectSingleNode("td[1]");
}

currently i have no idea how to check the nameNode and linkNode and extract data inside it

any help would be appreciated

Regards

Upvotes: 0

Views: 5394

Answers (3)

alexsuslin
alexsuslin

Reputation: 4225

    public const string UrlExtractor = @"(?: href\s*=)(?:[\s""']*)(?!#|mailto|location.|javascript|.*css|.*this\.)(?<url>.*?)(?:[\s>""'])";

    public static Match GetMatchRegEx(string text)
    {
        return new Regex(UrlExtractor, RegexOptions.IgnoreCase).Match(text);
    }

Here is how you can extract all Href Url. I'm using that regex in one of my projects, you can modify it to match your needs and rewrite it to match title as well. I guess it is more convenient to match them in bulk

Upvotes: 1

Tim Dams
Tim Dams

Reputation: 747

I can't test it right now, but it should be something among the lines of :

    string name= namenode.Element("a").Element("b").InnerText;
    string url= linknode.Element("a").GetAttributeValue("href","unknown");

Upvotes: 3

gizgok
gizgok

Reputation: 7639

nameNode.Attributes["title"]
linkNode.Attributes["href"]

presuming you are getting the correct Nodes.

Upvotes: 1

Related Questions