Ali Ismayilov
Ali Ismayilov

Reputation: 1755

Scala regex - How to match inside of curly braces but escape curly braces itself

I have a text something like this:

text {text10}
text {text1, text9}
anotherText [
{text2, text5}
{text3, text6}
{test4, text8}
]

This regex match everything what I want:

val regex =  """(.*?) (\[.*?\]|\{(.*?)\})""".r

However I have a small issue. I don't want to match braces itself. So, I got output as

val line = regex findAllIn configByLines
line.matchData foreach {
  m => println("output: "+m.group(2))
}
#output: {text10}
#output: {text1, text9}
#output: [{text2, text5} {text3, text6} {test4, text8}]

But I would like to get output for group(2) as

#output: text10
#output: text1, text9
#output: {text2, text5} {text3, text6} {text4, text8}

How can I fix my regex.

Upvotes: 3

Views: 2335

Answers (3)

Régis Jean-Gilles
Régis Jean-Gilles

Reputation: 32719

It is very much doable, though you might want to make sure you really need to do it using regex, as the result isn't quite pretty, and pretty much unmaintanable:

val regex =  """[^\{\[]*[\{\[](((?<=\{)[^}]*)|((?<=\[)[^\]]*))[\}\]]""".r

The main trick was to use a zero-width negative lookbehind (such as (?<=\{), to avoid matching '{' itself).

The matched text in in group 1.

Mandatory REPL session:

scala> val configByLines = """text {text10}
     | text {text1, text9}
     | anotherText [
     | {text2, text5}
     | {text3, text6}
     | {test4, text8}
     | ]"""
configByLines: String =
text {text10}
text {text1, text9}
anotherText [
{text2, text5}
{text3, text6}
{test4, text8}
]

scala> val regex =  """[^\{\[]*[\{\[](((?<=\{)[^}]*)|((?<=\[)[^\]]*))[\}\]]""".r
regex: scala.util.matching.Regex = [^\{\[]*[\{\[](((?<=\{)[^}]*)|((?<=\[)[^\]]*))[\}\]]

scala> val line = regex findAllIn configByLines.replace("\n", " ")
line: scala.util.matching.Regex.MatchIterator = non-empty iterator

scala> line.matchData foreach {
     |   m => println("output: "+m.group(1))
     | }
output: text10
output: text1, text9
output:  {text2, text5} {text3, text6} {test4, text8}

Upvotes: 3

pndc
pndc

Reputation: 3795

Regular expressions are overkill for this; they're used in Perl for this kind of parsing because the regex engine is powerful and brings performance benefits, but in the JVM you don't really win anything by using regexes unless you actually need their power. So I recommend manual parsing for this specific example.

Take your string and split it on opening braces:

scala> "anotherText [{text2} {text3}]" split '{'
res1: Array[String] = Array(anotherText [, "text2} ", text3}])

Throw away the first element since that was not preceded by an opening brace:

scala> ("anotherText [{text2} {text3}]" split '{').tail
res2: Array[String] = Array("text2} ", text3}])

This will still work even if the string starts with an opening brace, because the split will generate an empty first element.

Now you can process the array splitting on the closing brace and taking the part before the brace:

scala> ("anotherText [{text2} {text3}]" split '{').tail map (_.split('}').head)
res3: Array[String] = Array(text2, text3)

Note that this is not at all robust against unbalanced braces, which includes cases where the brace-enclosed string itself contains braces. Experiment with my final example against some such strings. For that, you'll need to build a (trivial) parser and decide on how you're going to escape or otherwise encode embedded braces. Likewise if your example is actually a simplified version of a rather more complex language.

Upvotes: -2

Avinash Raj
Avinash Raj

Reputation: 174706

You could use \G anchor if scala support this feature.

(?:^(.*?) \[?|(?<!^)\G){?([\w]*)}?

DEMO

Upvotes: 0

Related Questions