Jorge Nachtigall
Jorge Nachtigall

Reputation: 539

Javacc parser generator not recognizing my language (input)

I'm working with Javacc to build a parser for a Pascal subset.

This is my code:

PARSER_BEGIN(Pascal)
import java.io.*;
public class Pascal {

  public static void main(String args[]) throws ParseException,IOException {

    Pascal parser = new Pascal(new FileInputStream(args[0]));
    parser.Programa();
  }

}

PARSER_END(Pascal)

SKIP :
{
  " "
| "\t"
| "\n"
| "\r"
}

TOKEN :
{
  <PROGRAM: "program">
| <INTEIRO: "integer">
| <REAL: "real">
| <VAR: "var">
| <OF: "of">
| <FUNCTION: "function">
| <PROCEDURE: "procedure">
| <LBRACE:"(">
| <RBRACE: ")">
| <SEMI: ";">
| <PTS: ":">
| <BEGIN: "begin">
| <END: "end">
| <ATRIB: ":=">
| <ARRAY: "array">
| <LBRACKETS: "[">
| <RBRACKETS: "]">
| <IF: "if">
| <THEN: "then">
| <ELSE: "else">
| <NOT: "not">
| <PLUS: "+">
| <MINUS: "-">
| <WHILE: "while">
| <DO: "do">
}

TOKEN :
{
 <OPERADOR_MULTIPLICATIVO: ("*"|"/"|"div"|"mod"|"and")>
|
 <OPERADOR_ADITIVO: ("+"| "-" | "or")>
|
 <OPERADOR_RELACIONAL: ("=" | "<>" | "<" | "<=" | ">=" | ">")>
|
 <ID: ["a"-"z","A"-"Z"] ( ["a"-"z","A"-"Z","0"-"9"])*>
|
 <DIGT: ["0"-"9"] (["0"-"9"])*>

}



void Programa () :
{}
{ <PROGRAM> <ID> <LBRACE> Lista_de_identificadores() <RBRACE> <SEMI> 
  Declaracoes()
  Declara_subprogram() 
  Enunciado_composto()
  <EOF> 
}

// lista_de_identificadores

void Lista_de_identificadores():
{}
{
  <ID> Lista2()
}

void Lista2():
{}
{
 ("," <ID> Lista2())?
}

//declarações

void Declaracoes():
{}
{
    (<VAR> Lista_de_identificadores() <PTS> Tipo() <SEMI>)*
}

// tipo

void Tipo():
{}
{
    (Tipo_padrao() | <ARRAY> <LBRACKETS> <DIGT> <RBRACKETS> <OF> Tipo_padrao())
}

//tipo_padrao

void Tipo_padrao():
{}
{
    (<INTEIRO> | <REAL>)
}

//declarações_de_subprogramas

void Declara_subprogram():
{}
{
    (Subprogram() <SEMI>)*
}

//declaração_de_subprograma

void Subprogram():
{}
{
    Cabecalho_subprogram()
    Declaracoes()
    Enunciado_composto()
}

//cabeçalho_de_subprograma

void Cabecalho_subprogram():
{}
{
    (<FUNCTION> <ID> Argumentos() <PTS> Tipo_padrao() <SEMI>) | (<PROCEDURE> <ID> Argumentos())
}

//argumentos

void Argumentos():
{}
{
    (<LBRACE> Lista_parametros() <RBRACE>)?
}

//lista_de_parâmetros

void Lista_parametros():
{}
{
    Lista_de_identificadores() <PTS> Tipo() Lista_parametros2()
}

void Lista_parametros2():
{}
{
    (<SEMI> Lista_de_identificadores() <PTS> Tipo() Lista_parametros2())?
}

//enunciado_composto

void Enunciado_composto():
{}
{
    <BEGIN> Enunciados_opcionais() <END>    
}

//enunciados_opcionais

void Enunciados_opcionais():
{}
{
    (Lista_enunciados())?
}

//lista_de_enunciados

void Lista_enunciados():
{}
{
    Enunciado() (<SEMI> Enunciado())*
}

void Enunciado():
{}
{
    LOOKAHEAD(5)(Variavel() <ATRIB> Expressao()) | (Chamada_procedimento()) | (Enunciado_composto()) | (<IF> Expressao() <THEN> Enunciado() <ELSE> Enunciado()) | (<WHILE> Expressao() <DO> Enunciado())
}

void Variavel():
{}
{
    LOOKAHEAD(2)(<ID>) | (<ID> <LBRACKETS> Expressao() <RBRACKETS>)
}

void Chamada_procedimento():
{}
{
    LOOKAHEAD(2)(<ID>) | (<ID> <LBRACE> Lista_expressoes() <RBRACE>)
}

void Lista_expressoes():
{}
{
    Expressao() Lista_expressoes2() 
}

void Lista_expressoes2():
{}
{
    ("," Expressao() Lista_expressoes2())?
}

void Expressao():
{}
{
    LOOKAHEAD(2)Expressao_simples() | Expressao_simples() <OPERADOR_RELACIONAL> Expressao_simples()
}

void Expressao_simples():
{}
{
    LOOKAHEAD(3)(Termo() Expressao_simples2()) | (Sinal() Termo() Expressao_simples2())
}

void Expressao_simples2():
{}
{
    (<OPERADOR_ADITIVO> Termo() Expressao_simples2())?
}

void Termo():
{}
{
    Fator() Termo2()
}

void Termo2():
{}
{
    (<OPERADOR_MULTIPLICATIVO> Fator() Termo2())?
}

void Fator():
{}
{
    LOOKAHEAD(2)(<ID>) | (<ID> <LBRACE> Lista_expressoes() <RBRACE>) | (<DIGT>) | (<LBRACE> Expressao() <RBRACE>) | (<NOT> Fator())
}

void Sinal():
{}
{
    (<PLUS> | <MINUS>)
}

And this is the input program:

program exemplo (input, output, test);
var x, y: integer;
function mdc (a, b: integer): integer;
begin
    if b = 0 then mdc := a
    else mdc := mdc (b, a mod b)
end;

begin
    read(x, y);
    write(mdc(x,y));
end;

Javacc returns this:

Exception in thread "main" ParseException: Encountered " <OPERADOR_RELACIONAL> "= "" at line 5, column 14.
Was expecting one of:
    "then" ...
    <OPERADOR_MULTIPLICATIVO> ...
    <OPERADOR_ADITIVO> ...
    <OPERADOR_MULTIPLICATIVO> ...
    <OPERADOR_ADITIVO> ...
    <OPERADOR_ADITIVO> ...
    <OPERADOR_MULTIPLICATIVO> ...
    <OPERADOR_ADITIVO> ...

        at Pascal.generateParseException(Pascal.java:984)
        at Pascal.jj_consume_token(Pascal.java:865)
        at Pascal.Enunciado(Pascal.java:270)
        at Pascal.Lista_enunciados(Pascal.java:235)
        at Pascal.Enunciados_opcionais(Pascal.java:223)
        at Pascal.Enunciado_composto(Pascal.java:211)
        at Pascal.Subprogram(Pascal.java:137)
        at Pascal.Declara_subprogram(Pascal.java:127)
        at Pascal.Programa(Pascal.java:20)
        at Pascal.main(Pascal.java:9)

The problem is, I can't understand why Javacc is expecting those arguments and calls the "=" wrong at the positiong he is. The part from the work on this especific context is this one (almost the complete code):

void Enunciado():
{}
{
    LOOKAHEAD(5)(Variavel() <ATRIB> Expressao()) | (Chamada_procedimento()) | (Enunciado_composto()) | (<IF> Expressao() <THEN> Enunciado() <ELSE> Enunciado()) | (<WHILE> Expressao() <DO> Enunciado())
}

void Variavel():
{}
{
    LOOKAHEAD(2)(<ID>) | (<ID> <LBRACKETS> Expressao() <RBRACKETS>)
}

void Chamada_procedimento():
{}
{
    LOOKAHEAD(2)(<ID>) | (<ID> <LBRACE> Lista_expressoes() <RBRACE>)
}

void Lista_expressoes():
{}
{
    Expressao() Lista_expressoes2() 
}

void Lista_expressoes2():
{}
{
    ("," Expressao() Lista_expressoes2())?
}

void Expressao():
{}
{
    LOOKAHEAD(2)Expressao_simples() | Expressao_simples() <OPERADOR_RELACIONAL> Expressao_simples()
}

void Expressao_simples():
{}
{
    LOOKAHEAD(3)(Termo() Expressao_simples2()) | (Sinal() Termo() Expressao_simples2())
}

void Expressao_simples2():
{}
{
    (<OPERADOR_ADITIVO> Termo() Expressao_simples2())?
}

void Termo():
{}
{
    Fator() Termo2()
}

void Termo2():
{}
{
    (<OPERADOR_MULTIPLICATIVO> Fator() Termo2())?
}

void Fator():
{}
{
    LOOKAHEAD(2)(<ID>) | (<ID> <LBRACE> Lista_expressoes() <RBRACE>) | (<DIGT>) | (<LBRACE> Expressao() <RBRACE>) | (<NOT> Fator())
}

Someone can figure out where's the error? I've tried a lot of things but this right now looks good for me (and it's in fact not). Thanks.

EDIT: The functions with the same name, but with the number 2 on the final, are made to eliminate the left recursion.

Upvotes: 1

Views: 612

Answers (2)

Theodore Norvell
Theodore Norvell

Reputation: 16241

The problem is that you are using LOOKAHEAD in a way that just won't work. For example you have

  LOOKAHEAD(2)Expressao_simples()
| Expressao_simples() <OPERADOR_RELACIONAL> Expressao_simples()

SO this says that if the next two tokens of input are consistent with an Expressao_simples take the first alternative, otherwise take the second alternative. Clearly in any situation where the second alternative might succeed, the next two tokens will be consistent with the first alternative too, so the first will be chosen.

Instead you can delay the choice until later

Expressao_simples()
( 
    <OPERADOR_RELACIONAL> Expressao_simples()
)?

Compare this code to the diagram in the Pascal Report (revised). enter image description here

Upvotes: 3

MartynA
MartynA

Reputation: 30735

I don't have this particular parser/generator available to test, but it seems strange that the parser seems to be regarding '= ' as a single token. I would investigate that first. If that doesn't reveal the problem, then the next thing to investigate is your definition of Expressao_simples.

I'm afraid that the easiest way to investigate a problem like this is a bit painful - to temporarily strip back the grammar to the simplest possible case, see if the parser accepts that and, if it does, expand the grammar and re-test until you identify the problem.

In other words, start by defining PROGRAM as

PROGRAM : "a" "=" "b"

If the parser accepts that, try

PROGRAM : IDENTIFIER "=" IDENTIFIER

then

PROGRAM : IDENTIFIER RELATIONALOPERATOR IDENTIFIER

then

PROGRAM : SIMPLEEXPRESSION RELATIONALOPERATOR SIMPLEEXPRESSION

etc. Eventually, you should find the construct which is causing the problem.

I'd say "good luck!", but you don't really need it, just a lot of patience and simple test-cases.

Upvotes: 2

Related Questions