M Psyllakis
M Psyllakis

Reputation: 63

What is the regular expression to match nested square bracket tags?

I created a regular expression pattern that matches square bracket, Wiki-type tags like the following:

[h1]Some content[/h1]
[b]some more content[/b]
[i]some more still[/i]

Here is a scenario:

This [b]sentence[/b] is just an [b][i]example[/i][/b].

Here is the pattern:

\[\w{1,2}\](.*?)\[\/\w{1,2}]

The thing is, sometimes the tags are nested. For example:

[b][i]nested tags content[/i][/b]

Nesting doesn't get more complicated than this. As would be expected, the pattern returns:

[b][i]nested tags content[/i]

What modification should I make in the pattern or what other pattern should I use for the match to capture the entire nested set?

Upvotes: 1

Views: 1204

Answers (3)

Ryszard Czech
Ryszard Czech

Reputation: 18611

Use

(?s)\[(\w{1,2})]((?>(?<c>)\[\w{1,2}]|(?<-c>)\[/\w{1,2}]|.)*?)\[/\1]

See regex proof.

EXPLANATION

---------------------------------------------------------------------------------------------------
(?s)                         dotall mode
---------------------------------------------------------------------------------------------------
\[                          "[" symbol
---------------------------------------------------------------------------------------------------
(\w{1,2})                    one, two word characters
---------------------------------------------------------------------------------------------------
]                          "]" symbol
---------------------------------------------------------------------------------------------------
((?>(?<c>)\[\w{1,2}]|(?<-c>)\[/\w{1,2}]|.)*?) Nested tag part
---------------------------------------------------------------------------------------------------
\[                          "[" symbol
---------------------------------------------------------------------------------------------------
/                          "/" symbol
---------------------------------------------------------------------------------------------------
\1                          Backreference to Group 1
---------------------------------------------------------------------------------------------------
]                           "]" symbol
---------------------------------------------------------------------------------------------------

Upvotes: 0

Patrick Hofman
Patrick Hofman

Reputation: 156948

Regular expression don't do very well with the conditions you set. Especially when you have both nested expressions and multiple occurrences per string make it hard for a regular expression to parse.

It might be quite heavy to go that way, but a parser like ANTLR is better suited for this. And if you are capable, you can write you own simple string parser.

Upvotes: 2

Kevin Law
Kevin Law

Reputation: 852

just remove the question mark and get first group would be what you expected. *? Quantifier — Matches as few times as possible, expanding as needed。 But what you need is as many times as possible as the default acting. \[\w{1,2}\](.*)\[\/\w{1,2}]

Upvotes: 0

Related Questions