lostknight
lostknight

Reputation: 109

Pulling a specific link from a site

I a m trying to pull links from a website, the HTML of what I want look like the below:

<div class="default-2-3">
<h4>
  <a href="/en/news/esports/esports-editorial/na-lcs-week-8-tease-tsm">NA LCS Week 8 Tease: TSM</a>
 </h4>

My test code looks like:

string mainURL = "http://na.leagueoflegends.com/en/news/";
HtmlAgilityPack.HtmlWeb web = new HtmlAgilityPack.HtmlWeb();
HtmlAgilityPack.HtmlDocument doc = web.Load(mainURL);

var findClass = doc.DocumentNode.Descendants("div")
                   .Where(d => d.Attributes.Contains("class") && 
                               d.Attributes["class"].Value.Contains("default-2-3"));
MessageBox.Show(findClass.ElementAt(1).ToString());

Currently the message box shows:

HtmlAgilityPack.htmlNode

I based my code off of this finding specific link from website, I am new to using HTMLAgilityPack beyond just copying the XPath.

For reference, the site I am trying to extract information is: http://na.leagueoflegends.com/en/news/

Upvotes: 1

Views: 68

Answers (2)

M. Adeel Khalid
M. Adeel Khalid

Reputation: 1796

Actually there are more than one div with the same class=default-2-3 You are trying to fetch the very first which you can get by doing this:

var href = doc.DocumentNode.SelectNodes(".//div[@class='default-2-3']//h4//a[@href]").Select(x=>x.Attributes["href"].Value);

Upvotes: 0

Hung Cao
Hung Cao

Reputation: 3208

You are close to the final step. Just a little more.

var findClass = doc.DocumentNode.Descendants("div").Where(d => d.Attributes.Contains("class") && d.Attributes["class"].Value.Contains("default-2-3")).Select(_ => _.Descendants("a").FirstOrDefault()?.Attributes["href"]);

Upvotes: 1

Related Questions