Reputation: 539
I'm working with Javacc to build a parser for a Pascal subset.
This is my code:
PARSER_BEGIN(Pascal)
import java.io.*;
public class Pascal {
public static void main(String args[]) throws ParseException,IOException {
Pascal parser = new Pascal(new FileInputStream(args[0]));
parser.Programa();
}
}
PARSER_END(Pascal)
SKIP :
{
" "
| "\t"
| "\n"
| "\r"
}
TOKEN :
{
<PROGRAM: "program">
| <INTEIRO: "integer">
| <REAL: "real">
| <VAR: "var">
| <OF: "of">
| <FUNCTION: "function">
| <PROCEDURE: "procedure">
| <LBRACE:"(">
| <RBRACE: ")">
| <SEMI: ";">
| <PTS: ":">
| <BEGIN: "begin">
| <END: "end">
| <ATRIB: ":=">
| <ARRAY: "array">
| <LBRACKETS: "[">
| <RBRACKETS: "]">
| <IF: "if">
| <THEN: "then">
| <ELSE: "else">
| <NOT: "not">
| <PLUS: "+">
| <MINUS: "-">
| <WHILE: "while">
| <DO: "do">
}
TOKEN :
{
<OPERADOR_MULTIPLICATIVO: ("*"|"/"|"div"|"mod"|"and")>
|
<OPERADOR_ADITIVO: ("+"| "-" | "or")>
|
<OPERADOR_RELACIONAL: ("=" | "<>" | "<" | "<=" | ">=" | ">")>
|
<ID: ["a"-"z","A"-"Z"] ( ["a"-"z","A"-"Z","0"-"9"])*>
|
<DIGT: ["0"-"9"] (["0"-"9"])*>
}
void Programa () :
{}
{ <PROGRAM> <ID> <LBRACE> Lista_de_identificadores() <RBRACE> <SEMI>
Declaracoes()
Declara_subprogram()
Enunciado_composto()
<EOF>
}
// lista_de_identificadores
void Lista_de_identificadores():
{}
{
<ID> Lista2()
}
void Lista2():
{}
{
("," <ID> Lista2())?
}
//declarações
void Declaracoes():
{}
{
(<VAR> Lista_de_identificadores() <PTS> Tipo() <SEMI>)*
}
// tipo
void Tipo():
{}
{
(Tipo_padrao() | <ARRAY> <LBRACKETS> <DIGT> <RBRACKETS> <OF> Tipo_padrao())
}
//tipo_padrao
void Tipo_padrao():
{}
{
(<INTEIRO> | <REAL>)
}
//declarações_de_subprogramas
void Declara_subprogram():
{}
{
(Subprogram() <SEMI>)*
}
//declaração_de_subprograma
void Subprogram():
{}
{
Cabecalho_subprogram()
Declaracoes()
Enunciado_composto()
}
//cabeçalho_de_subprograma
void Cabecalho_subprogram():
{}
{
(<FUNCTION> <ID> Argumentos() <PTS> Tipo_padrao() <SEMI>) | (<PROCEDURE> <ID> Argumentos())
}
//argumentos
void Argumentos():
{}
{
(<LBRACE> Lista_parametros() <RBRACE>)?
}
//lista_de_parâmetros
void Lista_parametros():
{}
{
Lista_de_identificadores() <PTS> Tipo() Lista_parametros2()
}
void Lista_parametros2():
{}
{
(<SEMI> Lista_de_identificadores() <PTS> Tipo() Lista_parametros2())?
}
//enunciado_composto
void Enunciado_composto():
{}
{
<BEGIN> Enunciados_opcionais() <END>
}
//enunciados_opcionais
void Enunciados_opcionais():
{}
{
(Lista_enunciados())?
}
//lista_de_enunciados
void Lista_enunciados():
{}
{
Enunciado() (<SEMI> Enunciado())*
}
void Enunciado():
{}
{
LOOKAHEAD(5)(Variavel() <ATRIB> Expressao()) | (Chamada_procedimento()) | (Enunciado_composto()) | (<IF> Expressao() <THEN> Enunciado() <ELSE> Enunciado()) | (<WHILE> Expressao() <DO> Enunciado())
}
void Variavel():
{}
{
LOOKAHEAD(2)(<ID>) | (<ID> <LBRACKETS> Expressao() <RBRACKETS>)
}
void Chamada_procedimento():
{}
{
LOOKAHEAD(2)(<ID>) | (<ID> <LBRACE> Lista_expressoes() <RBRACE>)
}
void Lista_expressoes():
{}
{
Expressao() Lista_expressoes2()
}
void Lista_expressoes2():
{}
{
("," Expressao() Lista_expressoes2())?
}
void Expressao():
{}
{
LOOKAHEAD(2)Expressao_simples() | Expressao_simples() <OPERADOR_RELACIONAL> Expressao_simples()
}
void Expressao_simples():
{}
{
LOOKAHEAD(3)(Termo() Expressao_simples2()) | (Sinal() Termo() Expressao_simples2())
}
void Expressao_simples2():
{}
{
(<OPERADOR_ADITIVO> Termo() Expressao_simples2())?
}
void Termo():
{}
{
Fator() Termo2()
}
void Termo2():
{}
{
(<OPERADOR_MULTIPLICATIVO> Fator() Termo2())?
}
void Fator():
{}
{
LOOKAHEAD(2)(<ID>) | (<ID> <LBRACE> Lista_expressoes() <RBRACE>) | (<DIGT>) | (<LBRACE> Expressao() <RBRACE>) | (<NOT> Fator())
}
void Sinal():
{}
{
(<PLUS> | <MINUS>)
}
And this is the input program:
program exemplo (input, output, test);
var x, y: integer;
function mdc (a, b: integer): integer;
begin
if b = 0 then mdc := a
else mdc := mdc (b, a mod b)
end;
begin
read(x, y);
write(mdc(x,y));
end;
Javacc returns this:
Exception in thread "main" ParseException: Encountered " <OPERADOR_RELACIONAL> "= "" at line 5, column 14.
Was expecting one of:
"then" ...
<OPERADOR_MULTIPLICATIVO> ...
<OPERADOR_ADITIVO> ...
<OPERADOR_MULTIPLICATIVO> ...
<OPERADOR_ADITIVO> ...
<OPERADOR_ADITIVO> ...
<OPERADOR_MULTIPLICATIVO> ...
<OPERADOR_ADITIVO> ...
at Pascal.generateParseException(Pascal.java:984)
at Pascal.jj_consume_token(Pascal.java:865)
at Pascal.Enunciado(Pascal.java:270)
at Pascal.Lista_enunciados(Pascal.java:235)
at Pascal.Enunciados_opcionais(Pascal.java:223)
at Pascal.Enunciado_composto(Pascal.java:211)
at Pascal.Subprogram(Pascal.java:137)
at Pascal.Declara_subprogram(Pascal.java:127)
at Pascal.Programa(Pascal.java:20)
at Pascal.main(Pascal.java:9)
The problem is, I can't understand why Javacc is expecting those arguments and calls the "=" wrong at the positiong he is. The part from the work on this especific context is this one (almost the complete code):
void Enunciado():
{}
{
LOOKAHEAD(5)(Variavel() <ATRIB> Expressao()) | (Chamada_procedimento()) | (Enunciado_composto()) | (<IF> Expressao() <THEN> Enunciado() <ELSE> Enunciado()) | (<WHILE> Expressao() <DO> Enunciado())
}
void Variavel():
{}
{
LOOKAHEAD(2)(<ID>) | (<ID> <LBRACKETS> Expressao() <RBRACKETS>)
}
void Chamada_procedimento():
{}
{
LOOKAHEAD(2)(<ID>) | (<ID> <LBRACE> Lista_expressoes() <RBRACE>)
}
void Lista_expressoes():
{}
{
Expressao() Lista_expressoes2()
}
void Lista_expressoes2():
{}
{
("," Expressao() Lista_expressoes2())?
}
void Expressao():
{}
{
LOOKAHEAD(2)Expressao_simples() | Expressao_simples() <OPERADOR_RELACIONAL> Expressao_simples()
}
void Expressao_simples():
{}
{
LOOKAHEAD(3)(Termo() Expressao_simples2()) | (Sinal() Termo() Expressao_simples2())
}
void Expressao_simples2():
{}
{
(<OPERADOR_ADITIVO> Termo() Expressao_simples2())?
}
void Termo():
{}
{
Fator() Termo2()
}
void Termo2():
{}
{
(<OPERADOR_MULTIPLICATIVO> Fator() Termo2())?
}
void Fator():
{}
{
LOOKAHEAD(2)(<ID>) | (<ID> <LBRACE> Lista_expressoes() <RBRACE>) | (<DIGT>) | (<LBRACE> Expressao() <RBRACE>) | (<NOT> Fator())
}
Someone can figure out where's the error? I've tried a lot of things but this right now looks good for me (and it's in fact not). Thanks.
EDIT: The functions with the same name, but with the number 2 on the final, are made to eliminate the left recursion.
Upvotes: 1
Views: 612
Reputation: 16241
The problem is that you are using LOOKAHEAD in a way that just won't work. For example you have
LOOKAHEAD(2)Expressao_simples()
| Expressao_simples() <OPERADOR_RELACIONAL> Expressao_simples()
SO this says that if the next two tokens of input are consistent with an Expressao_simples
take the first alternative, otherwise take the second alternative. Clearly in any situation where the second alternative might succeed, the next two tokens will be consistent with the first alternative too, so the first will be chosen.
Instead you can delay the choice until later
Expressao_simples()
(
<OPERADOR_RELACIONAL> Expressao_simples()
)?
Compare this code to the diagram in the Pascal Report (revised).
Upvotes: 3
Reputation: 30735
I don't have this particular parser/generator available to test,
but it seems strange that the parser seems to be regarding '= '
as a single token. I would investigate that first.
If that doesn't reveal the problem, then the next thing to investigate
is your definition of Expressao_simples
.
I'm afraid that the easiest way to investigate a problem like this is a bit painful - to temporarily strip back the grammar to the simplest possible case, see if the parser accepts that and, if it does, expand the grammar and re-test until you identify the problem.
In other words, start by defining PROGRAM as
PROGRAM : "a" "=" "b"
If the parser accepts that, try
PROGRAM : IDENTIFIER "=" IDENTIFIER
then
PROGRAM : IDENTIFIER RELATIONALOPERATOR IDENTIFIER
then
PROGRAM : SIMPLEEXPRESSION RELATIONALOPERATOR SIMPLEEXPRESSION
etc. Eventually, you should find the construct which is causing the problem.
I'd say "good luck!", but you don't really need it, just a lot of patience and simple test-cases.
Upvotes: 2