user3190447
user3190447

Reputation: 51

Getting specific data from html

I want to get specific data from html. Im using c# and HtmlAgilityPack

Here's the HTML sample:

<p class="heading"><span>Greeting!</span>

<p class='verse'>Hi!<br>               //
Hello!</p><p class='verse'>Hello!<br>  // i want to get this g
Hi!</p>                                //

<p class="writers"><strong>WE</strong><br/>

Here my code in c#:

StringBuilder pureText = new StringBuilder();
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(Lyrics);

var s = doc.DocumentNode.Descendants("p");

try
{
     foreach (HtmlNode childNode in s)
     {
                        pureText.Append(childNode.InnerText);
     }
}
catch
{ }

UPDATE:

StringBuilder pureText = new StringBuilder();
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(URL);

var s = doc.DocumentNode.SelectNodes("//p[@class='verse']"); // error

try
{
     foreach (HtmlNode childNode in s)
     {
            pureText.Append(childNode.InnerText);
     }
}
catch
{ }

ERROR:

'HtmlAgilityPack.HtmlNode' does not contain a definition for 'SelectNodes' and no extension method 'SelectNodes' accepting a first argument of type 'HtmlAgilityPack.HtmlNode' could be found (are you missing a using directive or an assembly reference?)

Upvotes: 1

Views: 942

Answers (1)

har07
har07

Reputation: 89325

You can try with XPath query syntax to select all <p> having class='verse', like this :

var s = doc.DocumentNode.SelectNodes("//p[@class='verse']");

Then do the same foreach as you already have.

UPDATE I :

I don't know why the code above throwing error for you. It has been tested in my PC and should work fine. Anyway if you accept workaround, the same query can be achieved without XPath this way :

var s = doc.DocumentNode.Descendants("p").Where(o => o.Attributes["class"] != null && o.Attributes["class"].Value == "verse");

This solution is longer since we need to check if a node has class attibutes or not, before checking the attributes' value. Otherwise, we'll get Null Reference Exception if there any <p> without class attributes.

Upvotes: 5

Related Questions