Alex
Alex

Reputation: 9740

Remove all classes and ids from parsed HTML with HtmlAgilityPack

I use HtmlAgilityPack for parsing some html page, I extract html tags from this page like this:

HtmlNode bodyContent = document.DocumentNode.SelectSingleNode("//body");
var all_text = bodyContent.SelectNodes("//div | //ul | //p | //table");

in returned html each tag contain class and id, I want to remove all id-s and all class how I can to do this?

Upvotes: 4

Views: 1881

Answers (1)

Ivan Vasiljevic
Ivan Vasiljevic

Reputation: 5718

Maybe you should check this link: link.

As far as I can, tell when you have HtmlNode you can use its property Attributes. This collection has method Remove(string) that receive name of attribute that you want to remove. Well, I used it like this in one small project. I am not sure if this helps you.

So basically:

HtmlNode bodyContent = document.DocumentNode.SelectSingleNode("//body");
var all_text = bodyContent.SelectNodes("//div | //ul | //p | //table");

foreach(var node in all_text)
{
   node.Attributes.Remove("class");
   node.Attributes.Remove("id");
} 

Upvotes: 5

Related Questions