Getting specific data from html

Question

I want to get specific data from html. Im using c# and HtmlAgilityPack

Here's the HTML sample:

Greeting! Hi! // Hello! Hello! // i want to get this g Hi! //

WE

Here my code in c#:

StringBuilder pureText = new StringBuilder();
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(Lyrics);

var s = doc.DocumentNode.Descendants("p");

try
{
     foreach (HtmlNode childNode in s)
     {
                        pureText.Append(childNode.InnerText);
     }
}
catch
{ }

UPDATE:

StringBuilder pureText = new StringBuilder();
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(URL);

var s = doc.DocumentNode.SelectNodes("//p[@class='verse']"); // error

try
{
     foreach (HtmlNode childNode in s)
     {
            pureText.Append(childNode.InnerText);
     }
}
catch
{ }

ERROR:

'HtmlAgilityPack.HtmlNode' does not contain a definition for 'SelectNodes' and no extension method 'SelectNodes' accepting a first argument of type 'HtmlAgilityPack.HtmlNode' could be found (are you missing a using directive or an assembly reference?)

har07 · Accepted Answer

You can try with XPath query syntax to select all

having class='verse', like this :

var s = doc.DocumentNode.SelectNodes("//p[@class='verse']");

Then do the same foreach as you already have.

UPDATE I :

I don't know why the code above throwing error for you. It has been tested in my PC and should work fine. Anyway if you accept workaround, the same query can be achieved without XPath this way :

var s = doc.DocumentNode.Descendants("p").Where(o => o.Attributes["class"] != null && o.Attributes["class"].Value == "verse");

This solution is longer since we need to check if a node has class attibutes or not, before checking the attributes' value. Otherwise, we'll get Null Reference Exception if there any

without class attributes.

Getting specific data from html

Answers (1)

Related Questions