Veelkoov
Veelkoov

Reputation: 1376

C# regexp for nested tags

Let's start with little example; I have the following text:

[[ some tag [[ with tag nested ]] and again ]]

I'd like to match [[ with tag nested ]] but not [[ some tag [[ with tag nested ]] . Simple

\[\[(?<content>.+?)\]\]

obviously didn't work. So I created regexp:

\[\[(?!.*?\[\[.*?\]\].*?)(?<content>.+?)\]\]

Unfortunately it doesn't match anything using C# (with MatchOptions.SingleLine), while PHP's preg_match works perfectly.

Any clues/ideas? Any help would be much appreciated.

Upvotes: 2

Views: 2213

Answers (2)

Timwi
Timwi

Reputation: 66573

The simplest way that I know of to find just one of the innermost brackets is this:

var match = Regex.Match(input, @"^.*(\[\[(.*?)\]\])", RegexOptions.Singleline);

This works because it finds the last [[ (so there are no more [[ after it, so it can’t contain any nested tags) and then the immediately following ]]. Of course, this assumes well-formedness; if you have a string where the start/end brackets don’t match up properly, this can fail.

Once you’ve found the innermost bracket, you could remove it from the input string:

input = input.Remove(match.Groups[1].Index, match.Groups[1].Length);

and then repeat the process in a while loop until the regular expression no longer matches.

Upvotes: 3

Alan Moore
Alan Moore

Reputation: 75232

Would this be a valid match?

[[ with [ single ] brackets ]]

If not, this regex should do:

 \[\[(?<content>[^][]*)\]\]

[^][] matches any character that's not [ or ]. If single braces are allowed, try this:

\[\[(?<content>(?:(?!\[\[|\]\]).)*)\]\]

(?!\[\[|\]\]). matches any character, but only after making sure it's not the start of a [[ or ]] sequence.

Upvotes: 3

Related Questions