Reputation: 109
I am attempting to pull short little blurbs from site (lol).
HTML of what I am trying to pull is below.
<div class="field field-name-field-body-medium field-type-text-long field-label-hidden">
The community comics collaboration is back for another heaping helping of Academy fun!
</div>
code I am currently using that is not working.
var shortBio = doc.DocumentNode.Descendants("div").Where(p => p.Attributes.Contains("class") && p.Attributes["class"]
.Value.Contains("field field - name - field - body - medium field - type - text - long field - label - hidden"));
for (int i = 0; i < 5; i++)
{
blurbs[i] = shortBio.ElementAt(i).ToString();
}
obviously this is not working and I am not sure how to pull the text. I keep finding info on just pulling
Thank you in advance.
Upvotes: 2
Views: 728
Reputation: 89295
Looks like the parent of your target div
is given class teaser-content
which can be a good identifier. The following XPath should return the wanted div
:
//div[@class='teaser-content']/div
Then you can get the content text of the div
from InnerText
property, for example (replace SelectSingleNode()
with SelectNodes()
and iterate through the result if you want all div
s instead of just the first one) :
var doc = new HtmlWeb().Load("http://na.leagueoflegends.com/en/news/");
var div = doc.DocumentNode.SelectSingleNode("//div[@class='teaser-content']/div");
Console.WriteLine(div.InnerText);
output :
The community comics collaboration is back for another heaping helping of Academy fun!
Upvotes: 1