LowRez
LowRez

Reputation: 125

Regex to match block of text inside square brackets that can be nested

I'm writing a parser in scala that reads a string composed by repetitions of '+', '-', '<', '>' and '.' characters. The string may also have '[' and ']' characters and inside them there is a repetition of the first group of characters. I need a Regex that matches everything inside square brackets, the problem is that the brackets can be nested.

I've already tried with this regex: \[.*\] and many others that I've found on SO but none seems to be working.

The regex I'm looking for should work like this:

"[+++.]" matches "+++."

"[++[-]]" should match "++[-]"

edit (added a use case):

"[+++.] [++[-]]" should NOT match "+++.] [++[-]" but 2 matches of "+++." and "++[-]"

Upvotes: 2

Views: 758

Answers (2)

LowRez
LowRez

Reputation: 125

After some research I think I may have found the solution, however it is not usable in Scala. What is needed is a recursive regex that matches balanced constructs, in my case:

\[(?:[+-\[\]]|(?R))*\]

and as far as I know these kind are not supported in scala, so I'll just leave this here if someone needs it for other languages.

However I solved my problem by implementing the parser in another way, I just thought that having a regex like that would have been a simpler and smoother solution. What I was implementing was a brainfuck language interpreter and here is my parser class:

class brainfuck(var pointer: Int, var array: Array[Int]) extends JavaTokenParsers {
    def Program = rep(Statement) ^^ { _ => () }     
    def Statement: Parser[Unit] = 
        "+" ^^ { _ => array(pointer) = array(pointer) + 1 } | 
        "-" ^^ { _ => array(pointer) = array(pointer) - 1 } | 
        "." ^^ { _ => println("elem: " + array(pointer).toChar) } | 
        "," ^^ { _ => array(pointer) = readChar().toInt } |         
        ">" ^^ { _ => pointer = pointer + 1 } |
        "<" ^^ { _ => pointer = pointer - 1 } |
        "[" ~> rep(block|squares) <~ "]" ^^ { items => while(array(pointer)!=0) { parseAll(Program,items.mkString) } }

    def block = 
        """[-+.,<>]""".r ^^ { b => b.toString() }           

    def squares: Parser[String] = "[" ~> rep(block|squares) <~ "]" ^^ { b => var res = "[" + b.mkString + "]"; res }

}

Upvotes: 0

jwvh
jwvh

Reputation: 51271

That would be pretty tough with a single regex, but with some post-processing you might get a bit closer.

def parse(s :String) :Array[String] = 
  "\\[(.*)\\]".r.unanchored
              .findAllMatchIn(s)
              .toArray
              .flatMap(_.group(1).split(raw"][^\[\]]+\["))

usage:

parse("+++.]")           //res0: Array[String] = Array()
parse("[+++.]")          //res1: Array[String] = Array("+++.")
parse("[++[-]]")         //res2: Array[String] = Array("++[-]")
parse("[+++.] [++[-]]")  //res3: Array[String] = Array("+++.", "++[-]")
parse("[++[-]--] [+]")   //res4: Array[String] = Array(++[-]--, +)

Upvotes: 2

Related Questions