j3d
j3d

Reputation: 9724

JavaCC: How to handle tokens that contain common words

I'm trying to create a parser for source code like this:

[code table 1.0]
code table code_table_name
    id = 500
    desc = "my code table one"
end code table

... and here below is the grammar I defined:

PARSER_BEGIN(CodeTableParser)
...
PARSER_END(CodeTableParser)

/* skip spaces */
SKIP: {
         " "
    |    "\t"
    |    "\r"
    |    "\n"
}

/* reserved words */
TOKEN [IGNORE_CASE]: {
        <CODE_TAB_HEADER:     "[code table 1.0]">
    |   <CODE_TAB_END:        "end" (" ")+ <CODE_TAB_BEGIN>>
    |   <CODE_TAB_BEGIN:      <IDENT> | "code" (" ")+ "table">
    |   <ID:                  "id">
    |   <DESC:                "desc">
}

/* token images */
TOKEN: {
        <NUMBER:  (<DIGIT>)+>
    |   <IDENT:   (<ALPHA>)+>
    |   <VALUE:   (<ALPHA> ["[", "]"])+>
    |   <STRING:  <QUOTED>>
}

TOKEN: {
        <#ALPHA:  ["A"-"Z", "a"-"z", "0"-"9", "$", "_", "."]>
    |   <#DIGIT:  ["0"-"9"]>
    |   <#QUOTED: "\"" (~["\""])* "\"">
}

void parse():
{
}
{
    expression() <EOF>
}

void expression():
{
    Token tCodeTab;
}
{
    <CODE_TAB_HEADER>
    <CODE_TAB_BEGIN>
    tCodeTab = <IDENT>
    (
        <ID>
        <DESC>
    )*
    <CODE_TAB_END>
}

The problem is that the parser correctly identifies token ("code table")... but it doesn't identifies token IDENT ("code_table_name") since it contains the words already contained in token CODE_TAB_BEGIN (i.e. "code"). The parser complains saying that "code is followed by invalid character _"...

Having said that, I'm wondering what I'm missing in order to let the parser work correctly. I'm a newbie and any help would be really appreciated ;-)

Thanks, j3d

Upvotes: 1

Views: 1223

Answers (1)

Theodore Norvell
Theodore Norvell

Reputation: 16221

Your lexer will never produce an IDENT because the production

<CODE_TAB_BEGIN:      <IDENT> | "code" (" ")+ "table">

says that every IDENT can be a CODE_TAB_BEGIN and, as this production comes first, it beats the production for IDENT by the first match rule. (RTFFAQ)

Replace that production by

<CODE_TAB_BEGIN:      "code" (" ")+ "table">

You will run into trouble with ID and DESC, but this gets you past the second line of input.

Upvotes: 2

Related Questions