Confused regarding parsing with Bison

Question

I'm reading "Flex & Bison" by Levine and I'm a bit confused as to how exactly the parsing takes place. For this example I'll be using these rules for a simple calculator:

exp: factor | exp ADD factor | exp SUB factor
factor: term | factor MUL term | factor DIV term
term: NUMBER | ABS term | OP exp CP

I won't include the scanner here, but it does what you'd expect. My first source of confusion is with BNF itself, or rather, how it connects to context-free grammars. I was introduced to CFGs as a string-replacement/rewriting system, i.e. taking a symbol on the left, and replacing it with symbols on the right, but it seems to me the parser actually works "backwards", i.e. it doesn't replace the LHS with the RHS, but rather reduces matching RHS to the LHS. Is this correct?

My second question is demonstrated best through example. Suppose we wish to parse 1 + 3 * 2. The scanner takes 1 and gives us the token NUMBER, which (I'm assuming the parser here looks over all the rules that appear in right-hand sides) which it can reduce to "term", since there's no rule which starts with NUMBER except that one. By the same logic it can reduce the "term" to a "factor". It cannot reduce "factor" to "exp" however, since the next token may be a MUL or DIV. So it asks the scanner to get the next token, which is ADD. So right now the parser has in its stack "factor ADD". This matches no rule, so "factor" can safely be reduced to "exp". The stack is now "exp ADD". We need another token at this point, which is NUMBER. Again, reducing gets us "exp ADD term" and then "exp ADD factor". Here's where it gets confusing to me. The stack is now "exp ADD factor". Why doesn't the parser immediately reduce this to "exp", since it directly matches one of the rules? Instead what it actually does is look at the next token, which is a MUL, then the next token which is a NUMBER, then it reduces the NUMBER to a "term", meaning the stack is now "exp ADD factor MUL term" and only now does it reduce the last three symbols to a factor, then reducing the resulting "exp ADD factor" to an "exp".

Chris Dodd · Accepted Answer

The stack is now "exp ADD factor". Why doesn't the parser immediately reduce this to "exp", since it directly matches one of the rules?

Because the parser generated by bison is smart enough to realize that if it did do this reduction when the lookahead token is MUL, it would be stuck -- there's no rule that matches exp MUL... so it can't do that.

Basically the "state machine" built by bison encodes all the possible rules that might be reduced with the symbols/tokens on the stack so far and future tokens that might be in the input. So based on the current "state" (which is really a stack symbol, stored on the top of the state stack) and the current lookahead token it knows whether to shift (pushing a state corresponding to the lookahead token on the stack, and consuming the token) or reduce (popping states off the stack and then pushing another state based on the rule reduced and the state on the top of the stack after the pop), which does not consume the lookahead.

Confused regarding parsing with Bison

Answers (1)

Related Questions