Jon Bleins
Jon Bleins

Reputation: 39

How to recursively parse an expression?

I'm writing a small language, and I'm really stuck on expression parsing. I've written a LR Recursive Descent Parser, it works, but now I need to parse expressions I'm finding it really difficult. I do not have a grammar defined, but if it helps, I kind of have an idea on how it works even without a grammar. Currently, my expression struct looks like this:

typedef struct s_ExpressionNode {
    Token *value;
    char expressionType;

    struct *s_ExpressionNode lhand;
    char operand;
    struct *s_ExpressionNode rhand;
} ExpressionNode;

I'm trying to get it to parse something like:

5 + 5 + 2 * (-3 / 2) * age

I was reading this article on how to parse expressions. The first grammar I tried to implement but it didn't work out too well, then I noticed the second grammar, which appears to remove left recursion. However, I'm stuck trying to implement it since I don't understand what P, B means, and also U is a - but the - is also for a B? Also I'm not sure what expect(end) is supposed to mean either.

Upvotes: 1

Views: 882

Answers (1)

John Bollinger
John Bollinger

Reputation: 181199

In the "Recursive-descent recognition" section of the article you linked, the E, P, B, and U are the non-terminal symbols in the expression grammar presented. From their definitions in the text, I infer that "E" is chosen as a mnemonic for "expression", "P" as mnemonic for "primary", "B" for "binary (operator)", and "U" for "unary (operator)". Given those characterizations, it should be clear that the terminal symbol "-" can be reduced either to a U or to a B, depending on context:

unary:   -1
binary: x-1

The expect() function described in the article is used to consume the next token if it happens to be of the specified type, or otherwise to throw an error. The end token is defined to be a synthetic token representing the end of the input. Thus

expect(end)

expresses the expectation that there are no more tokens to process in the expression, and its given implementation throws an error if that expectation is not met.

All of this is in the text, except the reason for choosing the particular symbols E, P, B, and U. If you're having trouble following the text then you probably need to search out something simpler.

Upvotes: 1

Related Questions