newbie
newbie

Reputation: 41

Regex for extracting string exactly between {}

I am trying to extract something with the regex:

Pattern logEntry = Pattern.compile("digraph Checker \\{(.*)\\}");

for the block of text:

{ /*uninterested in this*/ 
"
digraph Checker 
{ 
/*bunch of stuff*/
{
/*bunch of stuff*/
}
{
/*bunch of stuff*/
}
{
/*bunch of stuff*/
}
/*bunch of stuff*/
} //first most curly brace ends, would want the regex to filter out till here, incl. the braces
"
}

and expect the output to be:

digraph Checker 
{ 
/*bunch of stuff*/
{
/*bunch of stuff*/
}
{
/*bunch of stuff*/
}
{
/*bunch of stuff*/
}
/*bunch of stuff*/
}

but can't seem to get rid of the last

"
}

Is there a way that I could extract this?

Upvotes: 0

Views: 409

Answers (2)

Serge Ballesta
Serge Ballesta

Reputation: 149075

@anubhava showed you a clever (but complicated) regex specifically adapted to your example. But as said by @sln, regexes are not well suited for balanced elements. That's the reason why specific libraries were developed to process XML (which make extensive use of balanced elements) such as JSoup.

So even if it is not the expected answer, the rule here is do not even try to use java regexes to parse balanced elements : you could find ways that (seem to) work in some cases but will break in another slightly different one.

The best you should to here is to build a dedicated parser. Or use one of the parser builders listed in Yacc equivalent for Java. According to that page, ANTLR should be the most popular Java tool for lexing/parsing. But if you are used to Lex/Yacc, you have also a look to JFlex and BYACC/J that do like that kind of parsing ...

Upvotes: 1

anubhava
anubhava

Reputation: 785521

You can use this regex:

Pattern logEntry = Pattern.compile("digraph Checker\\s+{((?:[^{]*{[^}]*})*[^}]*)}");

RegEx Demo

Upvotes: 2

Related Questions