lostknight
lostknight

Reputation: 109

C# HTML agility pack, pulling plain text from a div

I am attempting to pull short little blurbs from site (lol).

HTML of what I am trying to pull is below.

<div class="field field-name-field-body-medium field-type-text-long field-label-hidden">
The community comics collaboration is back for another heaping helping of Academy fun!
</div>

code I am currently using that is not working.

var shortBio = doc.DocumentNode.Descendants("div").Where(p => p.Attributes.Contains("class") && p.Attributes["class"]
         .Value.Contains("field field - name - field - body - medium field - type - text - long field - label - hidden"));


 for (int i = 0; i < 5; i++)
     {
         blurbs[i] = shortBio.ElementAt(i).ToString();
     }

obviously this is not working and I am not sure how to pull the text. I keep finding info on just pulling

Thank you in advance.

Upvotes: 2

Views: 728

Answers (1)

har07
har07

Reputation: 89295

Looks like the parent of your target div is given class teaser-content which can be a good identifier. The following XPath should return the wanted div :

//div[@class='teaser-content']/div

Then you can get the content text of the div from InnerText property, for example (replace SelectSingleNode() with SelectNodes() and iterate through the result if you want all divs instead of just the first one) :

var doc = new HtmlWeb().Load("http://na.leagueoflegends.com/en/news/");
var div = doc.DocumentNode.SelectSingleNode("//div[@class='teaser-content']/div");
Console.WriteLine(div.InnerText);

dotnetfiddle demo

output :

The community comics collaboration is back for another heaping helping of Academy fun!

Upvotes: 1

Related Questions