Reputation: 462
I'm trying to build a recursive descent parser that builds an AST as it goes. The language is fairly simple;
x >/</= y #x, y can be any letter/number, matched by regex.
Then, AND/OR
with opening/closing brackets follow and everything repeats itself. The brackets are always where they should be, x/y are always valid.
My code:
private Node parseLogic(Node lhs) {
String curr = token.curr();
if (curr.equals("(")) {
leftCount++;
token.progress(1);
return parseExpression();
}
if (curr.equals("AND ")) { // we want to get the rhs expression so we have to reparse
token.progress(1);
Node rhs = parseExpression();
return new And(lhs, rhs);
}
if (curr.equals("OR ")) { //same
token.progress(1);
Node rhs = parseExpression();
return new Or(lhs, rhs);
}
if (curr.equals(")")){
rightCount++;
token.progress(1);
if (leftCount == rightCount){
return lhs;
}
return parseLogic(lhs);
}
else{
return null;
}
}
private Node parseExpression(){
String curr = token.curr();
if (curr.equals("(")){
leftCount++;
token.progress(1);
return parseExpression();
}
else if (exPatt.matcher(curr).matches()){ // if it matches an expression, go inside. if next is ), close it
Node node = new Expression(curr);
if (token.next().equals("AND ") || token.next().equals("OR ") ||
token.next().equals("(") || token.next().equals(")")){
token.progress(1);
return parseLogic(node);
}
return node;
}
return null;
}
public static void main(String[] args) {
Parser parse = new Parser("amount > 50 AND ((row < 20 OR team > 30) AND (test < 40 OR you < 20)))");
//Node node = parse.mainParser();
Node node = parse.parseExpression();
node.getExpression();
}
}
The code works great on examples such as
amount > 50 OR (amount < 30 AND (test < 20 OR (test > 10 OR left < 10)))
But fails in ones like
amount > 50 AND ((row < 20 OR team > 30) AND (test < 40 OR you < 20)))
Where the handling of the closed parentheses fails, and instead of having the second node be
LHS = OR (row < 20, team > 30)
RHS = OR (test < 40, you <20)
we get
LHS = row < 20
RHS = AND (team > 30 OR (test <40, you <20))
Any pointers would be greatly appreciated.
EDIT: I think I fixed it. Keeping this up incase anyone else faces a recursive descent parser with an AST, as there aren't many SO questions on this topic.
Inside parseLogic
, the and/ors instead of returning just the node, re-enter parseLogic
with the new nodes. This is to basically check if the node is the left of any other logic node. if not, in the end, lhs
is simply returned.
Upvotes: 0
Views: 788
Reputation: 718688
I think that your big mistake is that you have flattened the grammar too much. An expression grammar (that respects operator precedence1) will have separate productions for expression, and-expression, or-expression, relation and primary.
In a true recursive descent parser, your parse methods follow the shape of the grammar productions. So a recursive descent parser shouldn't need to count left and right brackets. The grammar / recursion should deal with that.
1 - I see that you intend your parser to produce an AST. If the parser doesn't understand your language precedence rules, then the AST is liable to be incorrect for some examples.
... there aren't many SO questions on this topic.
Yes. Because grammars and parsers are a complicated / advanced topic. There is generally too much information to convey in a Stack Overflow Q&A format. So most questions (e.g. "how do I write a parser for X?") end up being (rightly) closed as "needs more focus".
My recommendation is to look for a good textbook on parsers or compiler writing. There is also some good introductory material in the documentation, etc for parser generator tools.
Upvotes: 4