Reputation: 125
I'm writing a parser in scala that reads a string composed by repetitions of '+', '-', '<', '>' and '.' characters. The string may also have '[' and ']' characters and inside them there is a repetition of the first group of characters. I need a Regex that matches everything inside square brackets, the problem is that the brackets can be nested.
I've already tried with this regex: \[.*\] and many others that I've found on SO but none seems to be working.
The regex I'm looking for should work like this:
"[+++.]" matches "+++."
"[++[-]]" should match "++[-]"
edit (added a use case):
"[+++.] [++[-]]" should NOT match "+++.] [++[-]" but 2 matches of "+++." and "++[-]"
Upvotes: 2
Views: 758
Reputation: 125
After some research I think I may have found the solution, however it is not usable in Scala. What is needed is a recursive regex that matches balanced constructs, in my case:
\[(?:[+-\[\]]|(?R))*\]
and as far as I know these kind are not supported in scala, so I'll just leave this here if someone needs it for other languages.
However I solved my problem by implementing the parser in another way, I just thought that having a regex like that would have been a simpler and smoother solution. What I was implementing was a brainfuck language interpreter and here is my parser class:
class brainfuck(var pointer: Int, var array: Array[Int]) extends JavaTokenParsers {
def Program = rep(Statement) ^^ { _ => () }
def Statement: Parser[Unit] =
"+" ^^ { _ => array(pointer) = array(pointer) + 1 } |
"-" ^^ { _ => array(pointer) = array(pointer) - 1 } |
"." ^^ { _ => println("elem: " + array(pointer).toChar) } |
"," ^^ { _ => array(pointer) = readChar().toInt } |
">" ^^ { _ => pointer = pointer + 1 } |
"<" ^^ { _ => pointer = pointer - 1 } |
"[" ~> rep(block|squares) <~ "]" ^^ { items => while(array(pointer)!=0) { parseAll(Program,items.mkString) } }
def block =
"""[-+.,<>]""".r ^^ { b => b.toString() }
def squares: Parser[String] = "[" ~> rep(block|squares) <~ "]" ^^ { b => var res = "[" + b.mkString + "]"; res }
}
Upvotes: 0
Reputation: 51271
That would be pretty tough with a single regex, but with some post-processing you might get a bit closer.
def parse(s :String) :Array[String] =
"\\[(.*)\\]".r.unanchored
.findAllMatchIn(s)
.toArray
.flatMap(_.group(1).split(raw"][^\[\]]+\["))
usage:
parse("+++.]") //res0: Array[String] = Array()
parse("[+++.]") //res1: Array[String] = Array("+++.")
parse("[++[-]]") //res2: Array[String] = Array("++[-]")
parse("[+++.] [++[-]]") //res3: Array[String] = Array("+++.", "++[-]")
parse("[++[-]--] [+]") //res4: Array[String] = Array(++[-]--, +)
Upvotes: 2