Aaron Silverman
Aaron Silverman

Reputation: 22655

How to comment out all script tags in an html document using HTML agility pack

I would like to comment out all script tags from an HtmlDocument. This way when I render the document the scripts are not executed however we can still see what was there. Unfortunately, my current approach is failing:

foreach (var scriptTag in htmlDocument.DocumentNode.SelectNodes("//script"))
            {
                var commentedScript = new HtmlNode(HtmlNodeType.Comment, htmlDocument, 0) { InnerHtml = scriptTag.ToString() };
                scriptTag.ParentNode.AppendChild(commentedScript);
                scriptTag.Remove();
            }

Note that I can do this using replace functions on the html, but I do not think it would be as robust:

domHtml = domHtml.Replace("<script", "<!-- <script");
domHtml = domHtml.Replace("</script>", "</script> -->");

Upvotes: 3

Views: 2494

Answers (2)

Jaans
Jaans

Reputation: 4638

Refer to this SO post - very clean solution utilising the Linq query support of the HTML Agility Pack: htmlagilitypack - remove script and style?

Upvotes: 0

IUnknown
IUnknown

Reputation: 22478

Try this:

foreach (var scriptTag in htmlDocument.DocumentNode.SelectNodes("//script"))
        {
            var commentedScript = HtmlTextNode.CreateNode(string.Format("<!--{0}-->", scriptTag.OuterHtml));
            scriptTag.ParentNode.ReplaceChild(commentedScript, scriptTag);
        }

Upvotes: 5

Related Questions