user901037
user901037

Reputation:

Combining top down and bottom up parsing

I am creating a parser for a language similar to C#, and decided that a combination of top-down and bottom-up would be best. Basically, I would like to use top-down parsing everywhere except individual statements. The reason for this is order of operations- its a lot easier to implement them with a bottom up, and it is a lot easier to implement everything else with a top-down parser.

The actual implementation of this is as follows: Start with top-down parsing, matching like normal. Then, in one of the rules, there is a rule that is marked as "bottom-up". X tokens are taken out of the input lex array, "crunched" with bottom-up (aka shift-reduce) into a single token, and that token is placed into the top-down match. The top-down continues on its way with the rest of its match.

My problem/question is how to get this number X. How many tokens do I take out of the input token stream to shift-reduce on? My initial solution was simple- take them until a semicolon is reached. However, what about things like if statements? There is no semicolon at the ends of those expressions.

I'll give an example to my original plan, because it probably just sounds like gibberish. Say the grammar set is as follows:

(top-down) statement ::= <expression> ";"
(bottom-up) expression =
    multiplication ::= <number> "*" <number>
    addition ::= <number> "+" <number>

Then it would reduce like so:

3 + 4 * 2 ;
((bottom-up) 3 + 4 * 2) ;
    3 + 4 * 2
    3 + multiplication
    addition
addition ;
statement

My problem, again, is how do I get the number of tokens to shift-reduce? How do I know that I want all of "3 + 4 * 2" to be placed into the shift-reducer?

Upvotes: 1

Views: 650

Answers (1)

ebohlman
ebohlman

Reputation: 15003

I suspect that rather than switching strategies for "individual statements" you'd be better off doing it only for expressions. That way you don't need to take a fixed number of tokens out of the stream; you simply treat whatever your grammar defines as an expression as an atomic unit, which you pass along to your sub-parser.

Upvotes: 1

Related Questions