Reputation: 9724
I'm trying to create a parser for source code like this:
[code table 1.0]
code table code_table_name
id = 500
desc = "my code table one"
end code table
... and here below is the grammar I defined:
PARSER_BEGIN(CodeTableParser)
...
PARSER_END(CodeTableParser)
/* skip spaces */
SKIP: {
" "
| "\t"
| "\r"
| "\n"
}
/* reserved words */
TOKEN [IGNORE_CASE]: {
<CODE_TAB_HEADER: "[code table 1.0]">
| <CODE_TAB_END: "end" (" ")+ <CODE_TAB_BEGIN>>
| <CODE_TAB_BEGIN: <IDENT> | "code" (" ")+ "table">
| <ID: "id">
| <DESC: "desc">
}
/* token images */
TOKEN: {
<NUMBER: (<DIGIT>)+>
| <IDENT: (<ALPHA>)+>
| <VALUE: (<ALPHA> ["[", "]"])+>
| <STRING: <QUOTED>>
}
TOKEN: {
<#ALPHA: ["A"-"Z", "a"-"z", "0"-"9", "$", "_", "."]>
| <#DIGIT: ["0"-"9"]>
| <#QUOTED: "\"" (~["\""])* "\"">
}
void parse():
{
}
{
expression() <EOF>
}
void expression():
{
Token tCodeTab;
}
{
<CODE_TAB_HEADER>
<CODE_TAB_BEGIN>
tCodeTab = <IDENT>
(
<ID>
<DESC>
)*
<CODE_TAB_END>
}
The problem is that the parser correctly identifies token ("code table")... but it doesn't identifies token IDENT ("code_table_name") since it contains the words already contained in token CODE_TAB_BEGIN (i.e. "code"). The parser complains saying that "code is followed by invalid character _"...
Having said that, I'm wondering what I'm missing in order to let the parser work correctly. I'm a newbie and any help would be really appreciated ;-)
Thanks, j3d
Upvotes: 1
Views: 1223
Reputation: 16221
Your lexer will never produce an IDENT because the production
<CODE_TAB_BEGIN: <IDENT> | "code" (" ")+ "table">
says that every IDENT can be a CODE_TAB_BEGIN and, as this production comes first, it beats the production for IDENT by the first match rule. (RTFFAQ)
Replace that production by
<CODE_TAB_BEGIN: "code" (" ")+ "table">
You will run into trouble with ID and DESC, but this gets you past the second line of input.
Upvotes: 2