Reputation: 536
I want to extract the text "Some text goes here" between the div class. I am using html agility pack, and c#
<div class="productDescriptionWrapper">
Some Text Goes here...
<div class="emptyClear"> </div>
</div>
this is what I have :
Description = doc.DocumentNode.SelectNodes("//div[@class=\"productDescriptionWrapper\").Descendants("div").Select(x => x.InnerText).ToList();
I get this error :
An unhandled exception of type 'System.NullReferenceException'
I know how to extract if the text is b/w a <h1>
or <p>
instead of "div" in Descendants i will have to give "h1" or "p".
Somebody please assist.
Upvotes: 0
Views: 3730
Reputation: 89335
There is no way you can get null reference exception given doc
is created from HTML snippet you posted. Anyway, if you meant to get text within the outer <div>
, but not from the inner one, then use xpath /text()
which mean get direct child text nodes.
For example, given this HTML snippet :
var html = @"<div class=""productDescriptionWrapper"">
Some Text Goes here...
<div class=""emptyClear"">Don't get this one</div>
</div>";
var doc = new HtmlDocument();
doc.LoadHtml(html);
..this expression return text from the outer <div>
only :
var Description = doc.DocumentNode
.SelectNodes("//div[@class='productDescriptionWrapper']/text()")
.Select(x => x.InnerText.Trim())
.First();
//Description :
//"Some Text Goes here..."
..while in contrast, the following return all the text :
var Description = doc.DocumentNode
.SelectNodes("//div[@class='productDescriptionWrapper']")
.Select(x => x.InnerText.Trim())
.First();
//Description :
//"Some Text Goes here...
//Don't get this one"
Upvotes: 1
Reputation: 2372
Use single quotes such as
//div[@class='productDescriptionWrapper']
to get all descendants of all types use:
//div[@class='productDescriptionWrapper']//*
,
to get all descendants of a specific type
such as a p
then use //div[@class='productDescriptionWrapper']//p
.
to get all descendants that are either a div
or a p
:
//div[@class='productDescriptionWrapper']//*[self::div or self::p]
say you wanted to get all non blank descendant text nodes then use:
//div[@class='productDescriptionWrapper']//text()[normalize-space()]
Upvotes: 1