tacos_tacos_tacos
tacos_tacos_tacos

Reputation: 10585

Use HtmlAgilityPack to determine if string contains ONLY tags from list of allowed tags

cf/ Finding HTML strings in document and similar questions.

I have seen examples of using HtmlAgilityPack to parse through a string looking for specific tags, but what if I want to make sure that the input string contains ONLY strings from a list List<string> AllowedTags?

In other words, how can I iterate over doc.DocumentNode.Descendants to identify the tag name and check if it is in the list?

Upvotes: 2

Views: 1488

Answers (2)

Ben Allred
Ben Allred

Reputation: 4864

List<string> AllowedTags = new List<string>() { "br", "a" };
HtmlDocument goodDoc = new HtmlDocument();
goodDoc.LoadHtml("<a href='asdf'>asdf</a><br /><a href='qwer'>qwer</a>");
bool containsBadTags = goodDoc.DocumentNode .Descendants()
                                            .Where(node => node.NodeType == HtmlNodeType.Element)
                                            .Select(node => node.Name)
                                            .Except(AllowedTags)
                                            .Any();
HtmlDocument badDoc = new HtmlDocument();
badDoc.LoadHtml("<a href='asdf'><b>asdf</b></a><br /><a href='qwer'>qwer</a>");
containsBadTags = badDoc.DocumentNode   .Descendants()
                                        .Where(node => node.NodeType == HtmlNodeType.Element)
                                        .Select(node => node.Name)
                                        .Except(AllowedTags)
                                        .Any();

Upvotes: 2

Sergey Berezovskiy
Sergey Berezovskiy

Reputation: 236208

var allowedTags = new List<string> { "html", "head", "body", "div" };

bool containsOnlyAllowedTags =
         doc.DocumentNode
            .Descendants()
            .Where(n => n.NodeType == HtmlNodeType.Element)
            .All(n => allowedTags.Contains(n.Name));

Upvotes: 3

Related Questions