Reputation: 63
I created a regular expression pattern that matches square bracket, Wiki-type tags like the following:
[h1]Some content[/h1]
[b]some more content[/b]
[i]some more still[/i]
Here is a scenario:
This [b]sentence[/b] is just an [b][i]example[/i][/b].
Here is the pattern:
\[\w{1,2}\](.*?)\[\/\w{1,2}]
The thing is, sometimes the tags are nested. For example:
[b][i]nested tags content[/i][/b]
Nesting doesn't get more complicated than this. As would be expected, the pattern returns:
[b][i]nested tags content[/i]
What modification should I make in the pattern or what other pattern should I use for the match to capture the entire nested set?
Upvotes: 1
Views: 1204
Reputation: 18611
Use
(?s)\[(\w{1,2})]((?>(?<c>)\[\w{1,2}]|(?<-c>)\[/\w{1,2}]|.)*?)\[/\1]
See regex proof.
EXPLANATION
---------------------------------------------------------------------------------------------------
(?s) dotall mode
---------------------------------------------------------------------------------------------------
\[ "[" symbol
---------------------------------------------------------------------------------------------------
(\w{1,2}) one, two word characters
---------------------------------------------------------------------------------------------------
] "]" symbol
---------------------------------------------------------------------------------------------------
((?>(?<c>)\[\w{1,2}]|(?<-c>)\[/\w{1,2}]|.)*?) Nested tag part
---------------------------------------------------------------------------------------------------
\[ "[" symbol
---------------------------------------------------------------------------------------------------
/ "/" symbol
---------------------------------------------------------------------------------------------------
\1 Backreference to Group 1
---------------------------------------------------------------------------------------------------
] "]" symbol
---------------------------------------------------------------------------------------------------
Upvotes: 0
Reputation: 156948
Regular expression don't do very well with the conditions you set. Especially when you have both nested expressions and multiple occurrences per string make it hard for a regular expression to parse.
It might be quite heavy to go that way, but a parser like ANTLR is better suited for this. And if you are capable, you can write you own simple string parser.
Upvotes: 2
Reputation: 852
just remove the question mark and get first group would be what you expected. *? Quantifier — Matches as few times as possible, expanding as needed。 But what you need is as many times as possible as the default acting.
\[\w{1,2}\](.*)\[\/\w{1,2}]
Upvotes: 0