jayawant.karale
jayawant.karale

Reputation: 43

HtmlAgilityPack select only inner text Node

This is my sample html input part of bigger html file.

string html = "<html> <p>Ingredients:</p> </html>";

I want to retrieve only node having inner text Ingredients. Ingredients may come in html node p, div, strong etc.

My c# code to achieve this using HtmlAgility pack and linq is

HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(html);

List<HtmlNode> ingredientList = doc.DocumentNode.Descendants().Where
                        (x => x.InnerText.Contains("Ingredients:")).ToList();

Result of this code gives me 3 nodes

<html> node
<p> node
#text node

I want retrieve only

<p> node

Upvotes: 4

Views: 2153

Answers (1)

har07
har07

Reputation: 89295

If your platform support XPath i.e HtmlAgilityPack's SelectNodes() method is available, you can use XPath expression to get element where one of its direct-child text node contains the keyword :

List<HtmlNode> ingredientList = doc.DocumentNode
                                   .SelectNodes("//*[text()[contains(.,'Ingredients:')]]")
                                   .ToList();

Upvotes: 6

Related Questions