Kyle Gobel
Kyle Gobel

Reputation: 5750

RegEx: Finding out if my match is in a <span>

Having some trouble with this one.

Trying to do some basic syntax highlighting for a custom file. Need to know if an element is inside a tag.

Some sample data

<span class="class1"> 
    Some Text <span class="class2">Some More Text</span>
    TEST
    <span>Text</span>
</span>
TEST

What I want to do here, is find the occurances of TEST that are not nested in a span tag.

The first one should not match, as it is nested inside class1, the second tag should match, because it isn't nested in any span tags.

the first test should show it's nested in a span tag, the second should show it's not.

I know regex is not meant to be used to parse html, but for my little situation, I thought using regex would be easiest, as I don't know another way to do what I'm looking for. I'm not against using XPath if it can solve this problem quickly.

In my code all I want is a method like this

bool InsideSpanTag(string source, int index);

this would return true if index is inbetween some span tags in the string source, and false if it's not.

EDIT: Nevermind, I'll just count the opening and closing span tags to the left of the index and see if the number of opening span tags are greater than the closing tags. Kinda quick and dirty but it's really all I needed.

Upvotes: 1

Views: 865

Answers (1)

Anirudha
Anirudha

Reputation: 32807

Regex is not a good choice for parsing HTML files..

HTML is not strict nor is it regular with its format.(except xhtml)

Use htmlagilitypack

Here's your code

HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(yourHtmlString);

bool valid= doc.DocumentNode
               .SelectNodes("//text()[not(parent::span)]")//this xpath selects all nodes whose parent is not span
               .Any(p => p.InnerText.Contains("TEXT"));

Upvotes: 5

Related Questions