Reputation: 43
This is my sample html input part of bigger html file.
string html = "<html> <p>Ingredients:</p> </html>";
I want to retrieve only node having inner text Ingredients. Ingredients may come in html node p, div, strong etc.
My c# code to achieve this using HtmlAgility pack and linq is
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(html);
List<HtmlNode> ingredientList = doc.DocumentNode.Descendants().Where
(x => x.InnerText.Contains("Ingredients:")).ToList();
Result of this code gives me 3 nodes
<html> node
<p> node
#text node
I want retrieve only
<p> node
Upvotes: 4
Views: 2153
Reputation: 89295
If your platform support XPath i.e HtmlAgilityPack's SelectNodes()
method is available, you can use XPath expression to get element where one of its direct-child text node contains the keyword :
List<HtmlNode> ingredientList = doc.DocumentNode
.SelectNodes("//*[text()[contains(.,'Ingredients:')]]")
.ToList();
Upvotes: 6