Reputation: 105563
I have the following grammar:
IdentifierName ::
IdentifierStart
IdentifierName IdentifierPart
Which using the word git
should be parsed into the following parse tree:
IdentifierName
/ \
IdentifierName IdentifierPart
/ \ |
IdentifierName IdentifierPart 't'
| |
IdentiiferStart 'i'
|
'g'
I want to write a recursive descent algorithm to do that. Now I have two options either write a recursive descent parser with backtracking or a predictive recursive descent parser. These both are not table-drive parsers. However, I've read that for the recursive descent with backtracking I need to eliminate left-recursion. The grammar in the question seems to be left recursive.
So am I right that I either need to refactor grammar or use predictive algorithm?
Upvotes: 1
Views: 614
Reputation: 370465
Yes, the grammar is left-recursive and thus not LL. Neither backtracking nor predictive LL-parsers can handle such a grammar. So you'd either need to change the grammar or use another algorithm such as an LR-parsing algorithm.
Note that this grammar is regular, so it can actually be translated to a regular expression or directly into a finite automaton.
When writing a real-world implementation of JavaScript the lexical rules such as this one would be handled in the lexer and only the other rules would be handled by the parser (however those a specified left-recursively as well, so they'd also have to be rewritten to be parsed by an LL-parser).
Upvotes: 3
Reputation: 1236
This tends to be the work of the lexer, not the parser. Normally, a lexer proceeds one character at a time, in a loop, with a big switch statement (or the equivalent "initial character table" if data-driven.)
// near the end of the big "switch (ch) {" statement ...
default:
if (!isIdentifierStart(chInit))
{
log(Severity.ERROR, ILLEGAL_CHAR, new Object[]{quotedChar(chInit)},
lInitPos, source.getPosition());
}
// fall through
case 'A':case 'B':case 'C':case 'D':case 'E':case 'F':case 'G':
case 'H':case 'I':case 'J':case 'K':case 'L':case 'M':case 'N':
case 'O':case 'P':case 'Q':case 'R':case 'S':case 'T':case 'U':
case 'V':case 'W':case 'X':case 'Y':case 'Z':
case 'a':case 'b':case 'c':case 'd':case 'e':case 'f':case 'g':
case 'h':case 'i':case 'j':case 'k':case 'l':case 'm':case 'n':
case 'o':case 'p':case 'q':case 'r':case 's':case 't':case 'u':
case 'v':case 'w':case 'x':case 'y':case 'z':
case '_':
{
while (source.hasNext())
{
if (!isIdentifierPart(nextChar()))
{
source.rewind();
break;
}
}
String name = source.toString(lInitPos, source.getPosition());
// ...
}
If building by hand, I find it far easier to have a dedicated lexer (producing tokens from a stream of chars) and parser (producing an AST from a stream of tokens) than to try to combine those into one parser.
Upvotes: 0