Reputation: 10585
cf/ Finding HTML strings in document and similar questions.
I have seen examples of using HtmlAgilityPack
to parse through a string looking for specific tags, but what if I want to make sure that the input string contains ONLY strings from a list List<string> AllowedTags
?
In other words, how can I iterate over doc.DocumentNode.Descendants
to identify the tag name and check if it is in the list?
Upvotes: 2
Views: 1488
Reputation: 4864
List<string> AllowedTags = new List<string>() { "br", "a" };
HtmlDocument goodDoc = new HtmlDocument();
goodDoc.LoadHtml("<a href='asdf'>asdf</a><br /><a href='qwer'>qwer</a>");
bool containsBadTags = goodDoc.DocumentNode .Descendants()
.Where(node => node.NodeType == HtmlNodeType.Element)
.Select(node => node.Name)
.Except(AllowedTags)
.Any();
HtmlDocument badDoc = new HtmlDocument();
badDoc.LoadHtml("<a href='asdf'><b>asdf</b></a><br /><a href='qwer'>qwer</a>");
containsBadTags = badDoc.DocumentNode .Descendants()
.Where(node => node.NodeType == HtmlNodeType.Element)
.Select(node => node.Name)
.Except(AllowedTags)
.Any();
Upvotes: 2
Reputation: 236208
var allowedTags = new List<string> { "html", "head", "body", "div" };
bool containsOnlyAllowedTags =
doc.DocumentNode
.Descendants()
.Where(n => n.NodeType == HtmlNodeType.Element)
.All(n => allowedTags.Contains(n.Name));
Upvotes: 3