SharpShade
SharpShade

Reputation: 2163

Match recursive pattern without recursion

I'm currently working on a simple markup parser. After my first attempt was quite prone to errors I decided to give regular expressions a shot and learnt the syntax. So far so good, I got a pattern that matches my markup.

But just of now I realized that I have a situation where I got recursion. I'm new to regex and thus don't know how to solve this without recursion (which sadly isn't available in C#).

As a brief explanation, I've got the following markup scheme:

{TagName}...Content...{/TagName} are inline markups which are used for text formatting (bold, underline or more complex types like mail links).

Like all markup they can have parameters. This works so far.

The second type are value markups which generate dynamic text during parsing:

[TagName|ParameterName:ParameterValue;...:...;...]

My current expression matches them. Parameters have their own expression pattern which is resolved at a later stage on-demand (working fine).

\[([^\|]+)\|(?<Parameters>[^\]]+)\]

What troubles me now is that I need to have nested markups. Means, that parameter values of markups can be markups as well, like in this example:

[PS|Data:SetToken;default:Token.SetToken([PS|Data:ClassRef], ref [PS|Data:InstanceFieldName])]

The problem now is that my expression from above only matches until the closing bracket of the first nested markup, thus ending it to early and breaking the parsing procedure.

I figure that with recursion I could easily match the nested ones as well, as of now I just need that the match actually matches until the right closing bracket.

I've seen that I got the same problem with my inline markups (but usually you don't use the same twice, still an issue though).

I've read that there is some kind of regex feature that matches an equivalent count of characters to some others, like "aaabbb". Could this fix it? Any other solutions?

Upvotes: 2

Views: 355

Answers (2)

SharpShade
SharpShade

Reputation: 2163

Thanks to this article on retkomma I found the solution. It's a lot easier than thought. My personal problem was that I kinda misunderstood matching groups. And after all I missed the knowledge about how to invalidate the matching progress if there's an uneven count. The article really helped therefore.

Here's the pattern for my value markups:

(?<!\\)\[
(?'Tag'[^|]*)
(\|
(?'Parameters'
(?>
\[(?'A')|\](?'-A')|.?
)*
)
)?
(?(A)(?!))
(?<!\\)\]

This pattern matches the right parts, saves the right groups and furthermore allows to escape '[' or ']' thus making them invalid matches (in case you need to write that without marking any markup).

I'm still trying to get the inline markup regex pattern working. It actually does work already, though in some rare cases (when nested one's have the same tag name) it still doesn't work. But the initial question is answered.

Upvotes: 0

Aydin
Aydin

Reputation: 15294

Is something like this what you were looking for...

var matches = Regex.Matches("[PS|Data:SetToken;default:Token.SetToken([PS|Data:ClassRef], ref [PS|Data:InstanceFieldName])]",@"\[([^\|]+)\|(?<Parameters>[^\]]+)\]",RegexOptions.Multiline)
                   .Cast<Match>()
                   .Select(match => new 
                   {
                       First = match.Groups[1].Value,
                       Second = match.Groups[2].Value
                   });

Regex Output

Upvotes: 1

Related Questions