Reputation: 51
I want to get specific data from html. Im using c# and HtmlAgilityPack
Here's the HTML sample:
<p class="heading"><span>Greeting!</span>
<p class='verse'>Hi!<br> //
Hello!</p><p class='verse'>Hello!<br> // i want to get this g
Hi!</p> //
<p class="writers"><strong>WE</strong><br/>
Here my code in c#:
StringBuilder pureText = new StringBuilder();
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(Lyrics);
var s = doc.DocumentNode.Descendants("p");
try
{
foreach (HtmlNode childNode in s)
{
pureText.Append(childNode.InnerText);
}
}
catch
{ }
UPDATE:
StringBuilder pureText = new StringBuilder();
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(URL);
var s = doc.DocumentNode.SelectNodes("//p[@class='verse']"); // error
try
{
foreach (HtmlNode childNode in s)
{
pureText.Append(childNode.InnerText);
}
}
catch
{ }
ERROR:
'HtmlAgilityPack.HtmlNode' does not contain a definition for 'SelectNodes' and no extension method 'SelectNodes' accepting a first argument of type 'HtmlAgilityPack.HtmlNode' could be found (are you missing a using directive or an assembly reference?)
Upvotes: 1
Views: 942
Reputation: 89325
You can try with XPath query syntax to select all <p>
having class='verse'
, like this :
var s = doc.DocumentNode.SelectNodes("//p[@class='verse']");
Then do the same foreach
as you already have.
UPDATE I :
I don't know why the code above throwing error for you. It has been tested in my PC and should work fine. Anyway if you accept workaround, the same query can be achieved without XPath this way :
var s = doc.DocumentNode.Descendants("p").Where(o => o.Attributes["class"] != null && o.Attributes["class"].Value == "verse");
This solution is longer since we need to check if a node has class attibutes or not, before checking the attributes' value. Otherwise, we'll get Null Reference Exception
if there any <p>
without class attributes.
Upvotes: 5