Reputation: 1
I am trying to create a parser for the DOCTOR script of the old ELIZA chatbot program.
The DOCTOR script, simplified here to a simple Welcome line followed by a line defining how "The Doctor" responds to a User input of say "IF ONLY I WAS THINNER" :
(I AM THE DOCTOR.)
(IF 3 ((0 IF 0)(DO YOU 3)(YOU WISH THAT 3)))
Here is the Lexer:
ALL_CHARS: [0-9A-Z, .];
KEY_CHARS: [A-Z];
LPAREN: '(';
RPAREN: ')';
NUM: [0-9];
SPACE: ' ';
WS: ('\n')+ -> skip;
and the Parser:
main: item* EOF;
item: (rWelcome | rKeyDecompReAssy );
rKeyDecompReAssy: LPAREN rKeyPri rDecompReAssy RPAREN;
rKeyPri: rKey SPACE rPri;
rKey: KEY_CHARS+;
rPri: NUM+;
rDecompReAssy: LPAREN rDecomp rReAssyList RPAREN;
rDecomp: LPAREN ALL_CHARS+ RPAREN;
rReAssyList: (rReAssy)+;
rReAssy: LPAREN reAssy RPAREN;
reAssy: ALL_CHARS+;
rWelcome: LPAREN reAssy RPAREN;
which defines a rule for the Welcome line (rWelcome) and one for the IF line (rDecompReAssy), which attempts to match 4 components: Key, Pri, Decomp and ReAssyList.
I use the ANTLR Preview of Android Studio.
The problem is that both lines are matched to rWelcome.
The Welcome line is OK of course, but the error message for the second is:
line 2:6 missing ')' at '('
line 2:45 mismatched input ')' expecting {<EOF>, '('}
How do I make the two rules unambiguous?
Upvotes: 0
Views: 165
Reputation: 170158
As mentioned in the comment, your lexer never creates KEY_CHARS
-, SPACE
- and NUM
-tokens. This is because the ALL_CHARS
token also matches the chars defined in those tokens. And when 2 or more lexer rules match the same characters, the one defined first "wins". No matter if a parser rule is trying to match a KEY_CHARS
token, the lexer simply creates a ALL_CHARS
token: the lexer works independently from the parser.
What you could do is something like this:
main : item* EOF;
item : (rWelcome | rKeyDecompReAssy );
rKeyDecompReAssy : LPAREN rKeyPri rDecompReAssy RPAREN;
rKeyPri : rKey SPACE rPri SPACE; // Note: I added the last `SPACE`
rKey : KEY_CHARS+;
rPri : NUM+;
rDecompReAssy : LPAREN rDecomp rReAssyList RPAREN;
rDecomp : LPAREN all_chars+ RPAREN;
rReAssyList : (rReAssy)+;
rReAssy : LPAREN reAssy RPAREN;
reAssy : all_chars+;
rWelcome : LPAREN reAssy RPAREN;
all_chars : NUM | KEY_CHARS | SPACE | OTHER_CHAR;
KEY_CHARS : [A-Z];
LPAREN : '(';
RPAREN : ')';
NUM : [0-9];
SPACE : ' ';
WS : ('\n')+ -> skip;
OTHER_CHAR : [.,];
Upvotes: 1