Reputation: 2442
I'm able to reach the node I want to extract but couldn't figure out how to separate different tags within the node.
p.s. I'm OK with regular expression; just curious if a simpler way with Html Agility Pack exists or not.
Code:
...
...
HtmlNodeCollection nodes = webContent.DocumentNode.SelectNodes("//*[@id='node-name']/ul/li");
foreach (HtmlNode node in nodes) {
String link = ???; // extract the http link here (href)
String text = ???; // extract the inner text here
String nums = ???; // extract the content of <small> tag here
...
}
html sample:
...
...
<ul class="some-class-name">
<li>
<a href="http://link-1.com">text for link 1<small>1</small></a>
</li>
<li>
<a href="http://link-2.org">text for link 2<small>2</small></a>
</li>
<li>
<a href="http://link-3.net">text for link 3<small>3</small></a>
</li>
</ul>
...
...
Upvotes: 0
Views: 380
Reputation: 11408
You can use either Element(s) or Descendants, from the native API.
Keep in mind that you can use extensions such as this to enable css selector querying, which in my understanding is the preferred (and easiest) way.
Follows a code snippet:
//https://stackoverflow.com/q/70203208/1219280
var doc = new HtmlDocument();
doc.LoadHtml(@"
<ul class='some -class-name'>
<li>
<a href = 'http://link-1.com' > text for link 1<small>1</small></a>
</li>
<li>
<a href = 'http://link-2.org' > text for link 2<small>2</small></a>
</li>
<li>
<a href = 'http://link-3.net' > text for link 3<small>3</small></a>
</li>
</ul>
");
Console.WriteLine("-------------------- Using Element(s) -------------------------");
//using Element(s), queries children in the next level only
var ul = doc.DocumentNode.Element("ul");
var lis = ul.Elements("li");
foreach(var li in lis)
{
var a = li.Element("a");
var href = a?.GetAttributeValue("href");
var smallText = a.Element("small")?.InnerText;
Console.WriteLine($"a href: [{href}] small: [{smallText}]");
}
Console.WriteLine("-------------------- Using Descendants -------------------------");
//using Descendants
var anchors = doc.DocumentNode.Descendants("a");
foreach(var a in anchors)
{
var href = a?.GetAttributeValue("href");
var smallText = a.Element("small")?.InnerText;
Console.WriteLine($"a href: [{href}] small: [{smallText}]");
}
Output:
Upvotes: 1