Reputation: 10033
I'm trying to select all elements that have a given class and remove them from a HTML string.
This is what I have so far it doesn't seem to remove anything although the source shows clearly 4 elements with that class name.
// Filter page HTML to display required content
HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument();
// filePath is a path to a file containing the html
htmlDoc.LoadHtml(pageHTML);
// ParseErrors is an ArrayList containing any errors from the Load statement);
if (!htmlDoc.ParseErrors.Any())
{
// Remove all elements marked with pdf-ignore class
HtmlNodeCollection nodes = htmlDoc.DocumentNode.SelectNodes("//body[@class='pdf-ignore']");
// Remove the collection from above
foreach (var node in nodes)
{
node.Remove();
}
}
EDIT: Just to clarify the document is parsing and the SelectNodes line is being hit, just not returning anything.
Here is a snippet of the html:
<input type=\"submit\" name=\"ctl00$MainContent$PrintBtn\" value=\"Print Shotlist\" onclick=\"window.print();\" id=\"MainContent_PrintBtn\" class=\"pdf-ignore\">
Upvotes: 0
Views: 1079
Reputation: 32323
EDIT: in your updated answer you posted a part of the HTML string an <input>
element declaration, but you're trying to match a <body>
element with the class pdf-ignore
(according to your expression //body[@class='pdf-ignore']
).
If you want to match all the elements from the document with this class you should use:
var nodes = htmlDoc.DocumentNode.SelectNodes("//*[contains(@class,'pdf-ignore')]");
code to get your nodes. This will match all the elements with the class name specified.
Your code is seems to be correct except the one detail: the condition htmlDoc.ParseErrors == null
. You select and remove nodes ONLY if the ParseErrors
property (which is a type of IEnumerable<HtmlParseError>
) is null
, but actually if no errors found this property returns an empty list. So changing your code to:
if (!htmlDoc.ParseErrors.Any())
{
// some logic here
}
should solve the issue.
Upvotes: 2
Reputation: 6216
Your xpath is probably not matching: have you tried "//div[class='pdf-ignore']"
(no "@"
)?
Upvotes: 0