Reputation: 32768
Is it possible to remove the commented text in html using htmlagilitypack library? Currently I'm doing some migrating work from ASP to ASP.NET MVC and there it's used Regex for those things and just want to know can I achieve that using htmlagilitypack before starting to try it.
Upvotes: 1
Views: 1572
Reputation: 32343
You could find all the nodes of type HtmlCommentNode
(which represents an HTML comment) and remove it from the document. But note, AgilityPack treats e.g. <!DOCTYPE html>
as a comment node too. So nodes like this should be skipped for deletion:
var doc = new HtmlDocument();
doc.LoadHtml(html);
var comments = doc.DocumentNode.DescendantNodes()
.OfType<HtmlCommentNode>()
.Where(c=>
!c.Comment.StartsWith("<!DOCTYPE", StringComparison.OrdinalIgnoreCase)
).ToList();
foreach (var comment in comments)
comment.Remove();
var result = doc.DocumentNode.InnerHtml;
Upvotes: 3