Sean
Sean

Reputation: 2048

Match balanced occurrences of nested tag

I have a test string:

s = "A test [[you|n|note|content of the note with a [[link|n|link|http://link]] inside]] paragraph. wef [[you|n|note|content of the note with a [[link|n|link|http://link]] inside]] test".

I need to match the occurrences of the [[...]] parts of the string. There can be up to the second level of nested [[ ]] tags in the string (as shown in the test string).

I started with /\[\[.*?\]\]/, but that only matches the following: [[you|n|note|content of the note with a [[link|n|link|http://link]] (it's missing the last occurrence of the ]].

How do I go about matching the remainder of each [[ .. ]] block? Is this possible with regex?

Upvotes: 0

Views: 61

Answers (2)

Cary Swoveland
Cary Swoveland

Reputation: 110685

Here's a non-regex solution. I've assumed left (right) brackets always appear in pairs.

level = 0
s.each_char.each_cons(2).with_index.with_object([]) do |(pair, i), a|
  case pair.join
  when "[["
    level += 1
    a << i if level==1
  when "]]"
    a << i+1 if level==1
    level -= 1
  end
end.each_slice(2).map { |b,e| s[b..e] }
  #=> ["[[you|n|note|content of the note with a [[link|n|link|http://link]] inside]]",
  #    "[[you|n|note|content of the note with a [[link|n|link|http://link]] inside]]"] 

Upvotes: 1

sawa
sawa

Reputation: 168101

If you don't have single isolated [ or ], then it is pretty much simple. The following assumes no restriction on the nested level.

s.scan(/(?<match>\[\[(?:[^\[\]]|\g<match>)*\]\])/).flatten

returns:

[
  "[[you|n|note|content of the note with a [[link|n|link|http://link]] inside]]",
  "[[you|n|note|content of the note with a [[link|n|link|http://link]] inside]]"
]

Upvotes: 1

Related Questions