HtmlAgilityPack select only inner text Node

Question

This is my sample html input part of bigger html file.

string html = " Ingredients: ";

I want to retrieve only node having inner text Ingredients. Ingredients may come in html node p, div, strong etc.

My c# code to achieve this using HtmlAgility pack and linq is

HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(html);

List ingredientList = doc.DocumentNode.Descendants().Where
                        (x => x.InnerText.Contains("Ingredients:")).ToList();

Result of this code gives me 3 nodes

node

node #text node

I want retrieve only

node

har07 · Accepted Answer

If your platform support XPath i.e HtmlAgilityPack's SelectNodes() method is available, you can use XPath expression to get element where one of its direct-child text node contains the keyword :

List ingredientList = doc.DocumentNode
                                   .SelectNodes("//*[text()[contains(.,'Ingredients:')]]")
                                   .ToList();

HtmlAgilityPack select only inner text Node

Answers (1)

Related Questions