Reputation: 165
I have an expression as a string below:
(((status==SUBMITTED) && (submit_date>2020-01-03)) &&(dueDate<(proof_date+1)))
I wanted to identify all the inner expressions in the above string as below:
(status==SUBMITTED)
(submit_date>2020-01-03)
(dueDate<(proof_date+1)
((status==SUBMITTED) && (submit_date>2020-01-03))
((status==SUBMITTED) && (submit_date>2020-01-03)) &&(dueDate<(proof_date+1))
Below is my code written in Java:
String expression = "((status==SUBMITTED)&&(submit_date>2020-01-03)&&(dueDate<(proof_date+1)))";
//Matcher m = Pattern.compile("\\((.*?)\\)").matcher(expression);
//Matcher m = Pattern.compile("\\([^()]*\\)").matcher(expression);
Matcher m = Pattern.compile("\\(([^()]*|\\([^()]*|\\))*\\)").matcher(expression);
while(m.find()) {
System.out.println("Group Result: "+m.group(0));
}
However I am not getting the combinations and its only showing the below values:
Group Result: ((status==SUBMITTED)
Group Result: (submit_date>2020-01-03)
Group Result: (dueDate<(proof_date+1)
How can I get all the above combinations. Is there a Regex that can identify this properly?
Upvotes: 2
Views: 37
Reputation: 4592
Regular expressions simply are not powerful enough to solve this problem in general. Think of it like trying to fight a five-alarm inferno with a garden hose.
The best you could hope for with regular expressions would be a pattern that can find parenthesized expressions with a specific level of nesting. For example, recognizing ((...)...(...))
should be possible, or (((...)...(...))..((...)..(...)))
. But even something as deceptively simple as (((...)..(...))...(...))
would be difficult, if not impossible.
To handle arbirary expressions, you need a more powerful technique than regular expressions. The next step up from regular expressions in the formal hierarchy of grammars is known as Context-free grammars (CFGs); they're powerful enough to handle any number of types of nested structures, properly nested within each other, and nested to arbitrary depths.
However, for just parenthesized expressions, you don't actually need to unleash the full power of a CFG parser. All you need is a stack, to keep track of left-parens for which you haven't yet seen a matching right-paren:
String expression = "((status==SUBMITTED)&&(submit_date>2020-01-03)&&(dueDate<(proof_date+1)))";
Stack<Integer> lpar = new Stack<>();
for (int i = 0; i < expression.length(); ++i) {
char c = expression.charAt(i);
if (c == '(') {
lpar.push(i);
} else if (c == ')') {
if (lpar.isEmpty()){
System.out.println("Unbalanced )");
break;
}
int start = lpar.pop();
System.out.println(expression.substring(start, i+1));
}
}
if (!lpar.isEmpty()){
System.out.println("Missing )");
}
Upvotes: 2