Haoshen Hong
Haoshen Hong

Reputation: 13

Parser example of JavaCC for arithmetic expressions

It is my first day in parsing area. As my first example in JavaCC, the code is

SKIP:  { " " | "\t" | "\n" | "\r"                    }
TOKEN: { "(" | ")" | "+" | "*" | <NUM: (["0"-"9"])+> }

void S(): {} { E() <EOF>           }
void E(): {} { T() ("+" T())*      }
void T(): {} { F() ("*" F())*      }
void F(): {} { <NUM> | "(" E() ")" }

I wonder why it can handle the case like int+(int+int). By its algorithm, I think it would parse this expression as [int] & [(int] & [int)] as Ts and fail to parse. Why it got parsed as intended?

Upvotes: 1

Views: 986

Answers (1)

Theodore Norvell
Theodore Norvell

Reputation: 16241

Consider the input string "1+(2+3)" this lexes as a sequence of tokens

<NUM> "+" "(" <NUM> "+" <NUM> ")" <EOF>

This sequence can be derived from S() as follows. The . shows which tokens have been consumed. As tokens are consumed, the . moves right

         . S() 
           ~~~ expand
     ==> . E() <EOF>
           ~~~ expand
     ==> . T() ("+" T())* <EOF>
           ~~~ expand
     ==> . F() ("*" F())* ("+" T())* <EOF>
           ~~~ expand
     ==> . (<NUM> | "(" E() ")") ("*" F())* ("+" T())* <EOF>
           ~~~~~~~~~~~~~~~~~~~~~ choose first and consume
     ==> <NUM> . ("*" F())* ("+" T())* <EOF>
                 ~~~~~~~~~~ delete
     ==> <NUM> . ("+" T())* <EOF>
                 ~~~~~~~~~~ unroll and consume
     ==> <NUM> "+" . T() ("+" T())* <EOF>
                     ~~~ expand
     ==> <NUM> "+" . F() ("*" F())* ("+" T())* <EOF>
                     ~~~~
     ==> <NUM> "+" . (<NUM> | "(" E() ")") ("*" F())* ("+" T())* <EOF>
                     ~~~~~~~~~~~~~~~~~~~~~ choose second and consume
     ==> <NUM> "+" "(" . E() ")" ("*" F())* ("+" T())* <EOF>
                         ~~ expand
     ==> <NUM> "+" "(" . T() ("+" T())* ")" ("*" F())* ("+" T())* <EOF>
                         ~~~ expand
     ==> <NUM> "+" "(" . F() ("*" F())* ("+" T())* ")" ("*" F())* ("+" T())* <EOF>
                         ~~~ expand
     ==> <NUM> "+" "(" . (<NUM> | "(" E() ")") ("*" F())* ("+" T())* ")" ("*" F())* ("+" T())* <EOF>
                         ~~~~~~~~~~~~~~~~~~~~~ choose first and consume
     ==> <NUM> "+" "(" <NUM> . ("*" F())* ("+" T())* ")" ("*" F())* ("+" T())* <EOF>
                               ~~~~~~~~~~ delete
     ==> <NUM> "+" "(" <NUM> . ("+" T())* ")" ("*" F())* ("+" T())* <EOF>
                               ~~~~~~~~~~~~~~ unroll and consume
     ==> <NUM> "+" "(" <NUM> "+" . T() ("+" T())* ")" ("*" F())* ("+" T())* <EOF>
                                   ~~~ expand
     ==> <NUM> "+" "(" <NUM> "+" . F() ("*" F())* ("+" T())* ")" ("*" F())* ("+" T())* <EOF>
                                   ~~~ expand
     ==> <NUM> "+" "(" <NUM> "+" . (<NUM> | "(" E() ")") ("*" F())* ("+" T())* ")" ("*" F())* ("+" T())* <EOF>
                                   ~~~~~~~~~~~~~~~~~~~~~ choose first and consume
     ==> <NUM> "+" "(" <NUM> "+" <NUM> . ("*" F())* ("+" T())* ")" ("*" F())* ("+" T())* <EOF>
                                         ~~~~~~~~~~ delete
     ==> <NUM> "+" "(" <NUM> "+" <NUM> . ("+" T())* ")" ("*" F())* ("+" T())* <EOF>
                                         ~~~~~~~~~~ delete and consume
     ==> <NUM> "+" "(" <NUM> "+" <NUM> ")" . ("*" F())* ("+" T())* <EOF>
                                             ~~~~~~~~~~ delete
     ==> <NUM> "+" "(" <NUM> "+" <NUM> ")" . ("+" T())* <EOF>
                                             ~~~~~~~~~~ delete and consume
     ==> <NUM> "+" "(" <NUM> "+" <NUM> ")" <EOF> .

Key:

  • expand: Replace a nonterminal with its definition.
  • choose: Replace (S|T) with either S or T
  • unroll: Replace (S)* with S (S)*
  • delete: Replace (S)* with nothing

The derivation above is a left-right derivation. I choose to show the left-right derivation because it's reflective of how JavaCC works.

Upvotes: 1

Related Questions