tah20
tah20

Reputation: 21

I have wrong result in my ANTLR Grammar

I have a grammar in ANTLR and I have a file for testing my grammar. But I don't know what is wrong with my output.

This is my grammar:

grammar proj;

start
    : (assign|define|read|write|condition|while|module|callingmodule)+
    ;

assign
    : T_ID T_ENTESAB expentesab T_SEPARATOR
    ;

expentesab
    : T_ID
    | T_NUMBER
    | T_SABETMANTEGHI
    | expentesab operator expentesab
    | expentesab operator
    | operator expentesab
    | T_PARANTEZBAZ expentesab T_PARANTEZBASTE
    | expentesab T_COMMA expentesab
    | T_PARANTEZBAZ expentesab T_COMMA expentesab T_PARANTEZBASTE
    | expentesab T_COMMA T_PARANTEZBAZ expentesab T_PARANTEZBASTE
    ;

operator
    : T_ADD
    | T_SUB
    | T_MUL
    | T_DIV
    | T_POW
    | T_FACT
    | T_AND
    | T_OR
    | T_XOR
    ;

define
    : T_ID T_2POINT T_TYPE T_SEPARATOR
    ;

read
    : T_READ expread T_SEPARATOR
    ;

expread
    : T_ID
    | T_NUMBER
    | operator
    | T_PARANTEZBAZ expread T_PARANTEZBASTE
    | expread operator expread
    ;

write
    : T_WRITE expwrite T_SEPARATOR
    ;

expwrite
    : T_ID
    | T_NUMBER
    | operator
    | T_PARANTEZBAZ expwrite T_PARANTEZBASTE
    | expwrite operator expwrite
    | expwrite T_COMPARE expwrite  
    ;

condition
    : T_IF expcon T_THEN code_if T_ELSE code_if T_SEPARATOR
    | T_IF expcon T_THEN code_if T_SEPARATOR
    ;

expcon
    : assign
    | define
    | expcon T_COMPARE expcon
    | expcon operator expcon
    | operator expcon
    ;

code_if
    : condition
    | block
    | define
    | assign
    | callingmodule
    | code_if operator code_if
    | T_PARANTEZBAZ code_if T_PARANTEZBASTE T_SEPARATOR
    | T_PARANTEZBAZ code_if operator code_if T_PARANTEZBASTE T_SEPARATOR
    ;

callingmodule
    : T_ID T_PARANTEZBAZ params T_PARANTEZBASTE T_SEPARATOR
    | T_ID T_PARANTEZBAZ  T_PARANTEZBASTE T_SEPARATOR
    ;

params
    : expparam(T_COMMA expparam)*
    ;

expparam
    : T_ID
    | shart
    | T_ID operator T_NUMBER
    | T_PARANTEZBAZ expparam T_PARANTEZBASTE
    ;

while
    : T_WHILE expwhile code_while
    ;

expwhile
    : T_SABETMANTEGHI
    | T_NUMBER
    | T_PARANTEZBAZ expwhile T_PARANTEZBASTE
    | expwhile operator expwhile
    | T_ID T_COMPARE T_ID
    | expwhile T_AND expwhile
    | expwhile T_OR expwhile
    | expwhile T_XOR expwhile      
    ;  

code_while 
    : block
    | module
    | callingmodule
    | define
    | assign
    | code_while operator code_while T_SEPARATOR
    | T_PARANTEZBAZ code_while T_PARANTEZBASTE T_SEPARATOR
    | T_PARANTEZBAZ code_while operator code_while T_PARANTEZBASTE T_SEPARATOR
    ;
        
block
    : T_BEGIN  inner_block  T_END
    ;

inner_block
    : define
    | assign
    | condition 
    | callingmodule
    | block
    | T_ID operator T_ID
    | T_PARANTEZBAZ inner_block T_PARANTEZBASTE T_SEPARATOR
    | T_PARANTEZBAZ T_ID operator T_ID T_PARANTEZBASTE T_SEPARATOR
    ;

module
    : T_MODULE T_ID T_INPUT T_2POINT (define)+ T_OUTPUT T_2POINT T_TYPE block
    | T_MODULE T_ID block 
    ;

shart
    : expcon T_CONDITION code_if T_2POINT code_if
    | expcon T_CONDITION code_if T_2POINT code_if T_SEPARATOR
    ;
        
T_TYPE: ('s'|'S')('t'|'T')('r'|'R')('i'|'I')('n'|'N')('g'|'G')|('r'|'R')('e'|'E')('a'|'A')('l'|'L')|
('b'|'B')('o'|'O')('o'|'O')('l'|'L');
T_END: ('e'|'E')('n'|'N')('d'|'D');
T_BEGIN:('b'|'B')('e'|'E')('g'|'G')('i'|'I')('n'|'N');
T_WHILE:('w'|'W')('h'|'H')('i'|'I')('l'|'L')('e'|'E');
T_IF:('i'|'I')('f'|'F');
T_THEN:('t'|'T')('h'|'H')('e'|'E')('n'|'N');
T_ELSE:('e'|'E')('l'|'L')('s'|'S')('e'|'E');
T_READ:('r'|'R')('e'|'E')('a'|'A')('d'|'D');
T_WRITE:('w'|'W')('r'|'R')('i'|'I')('t'|'T')('e'|'E');
T_MODULE:('M'|'m')('O'|'o')('D'|'d')('U'|'u')('L'|'l')('E'|'e');
T_INPUT:('I'|'i')('N'|'n')('P'|'p')('U'|'u')('T'|'t');
T_OUTPUT:('O'|'o')('U'|'u')('T'|'t')('P'|'p')('U'|'u')('T'|'t');
T_RETURN:('R'|'r')('E'|'e')('T'|'t')('U'|'u')('R'|'r')('N'|'n');
T_SEPARATOR : ';';
T_SABETMANTEGHI: ('t'|'T')('r'|'R')('u'|'U')('e'|'E')|('f'|'F')('a'|'A')('l'|'L')('s'|'S')('e'|'E');
T_NUMBER:T_HEXNUMBER|T_INTEGERNUMBER;
T_HEXNUMBER: '0' ('x'|'X') ('0'..'9'|'a'..'f'|'A'..'F')+|'0' ('x'|'X') ('0'..'9'|'a'..'f'|'A'..'F')+ '.' ('0'..'9'|'a'..'f'|'A'..'F')+;
T_INTEGERNUMBER:(('0'..'9')+|('0'..'9')+ '.'('0'..'9')+);
T_FUNC:('F'|'f')('U'|'u')|('N'|'n')('C'|'c');
T_ADD: '+';
T_SUB: '-';
T_MUL: '*';
T_DIV: '/';
T_POW: '^';
T_FACT: '!';
T_ENTESAB:'=';
T_X:'x'|'X';
T_AND: ('a'|'A')('n'|'N')('d'|'D');
T_OR: ('o'|'O')('r'|'R');
T_NOT: ('n'|'N')('o'|'O')('t'|'T');
T_XOR: ('x'|'X')('o'|'O')('r'|'R');
T_COMPARE: '>'| '<'| '>='|'<='|  '<>';
T_REMAIN: '%';
T_CONDITION:'?';
T_2POINT:':';
T_PARANTEZBAZ:'(';
T_PARANTEZBASTE:')';
T_COMMA:',';
T_COMMENT:T_COM1LINE|T_COMMULLINE;
T_COM1LINE: '%%' ~( '\t'|'\r')+ -> skip ;
T_COMMULLINE:'%%%' (.|('\t'|'\r'|' '|'\n'))*? '%%%' ->skip;
T_ID :   [a-zA-Z] ([a-zA-Z]|('0'..'9'))*;
T_WS : (('\t'|'\r'|' ')+) ->skip;
T_NEWLINE:('\n')->skip;
T_LEXICALERROR:.;

And this is my input file:

%%%This is a sample Written in QUPLA $
@The program compute fibonacci serie%%%
module func
input:
    X:real;
output:
  i:real;
begin
    if x> 0 then
    begin 
        return Func(x-1)+func(x-2);
    end
    begin
    return 1;
    end
end
%% This is the main module &%*&()
module main
begin
    i:real;
    read i;
    write (func(i)?1:2);
end

For this input, I have these errors:

In line 5 expecting T_ID but i have T_ID!

In line 8 expecting T_IF,T_WHILE T_READ.... But I have T_IF

Upvotes: 2

Views: 102

Answers (1)

quepas
quepas

Reputation: 1003

Let's start with your errors.

Answers

In line 5 expecting T_ID but i have T_ID!

This error is due the fact that you have lexer rule T_X:'x'|'X'; which will match to the X from line 5 of your sample code. X will be match to T_X lexem because T_X lexem is defined before expected T_ID lexem. The answer is: it is not a T_ID token but T_X.

In line 8 expecting T_IF,T_WHILE T_READ.... But I have T_IF

In line 7 from code example you are trying to define an output variable i:real. But you are missing of define+ rule in an output section of a module definition. I assume you can have named output parameter. Then proper module rule should looks like as follow:

module
    : T_MODULE T_ID T_INPUT T_2POINT define+ T_OUTPUT T_2POINT define+ T_TYPE block
    | T_MODULE T_ID  block
    ;

Because of missing define+ the definition of module rule is interrupted and everything after output: in line 6 is treated as definition (define) alternative from main rule start.

If above it's not the case and your code example is wrong then you should remove i: characters in the output section of the module.

Anyway, the answer is: code example is inconsistent with your grammar.

Modifications

Rearange your tokens

You should define your tokens in an order:

  1. Skipped tokens (e.g. whitespaces, comments)
  2. Specialized tokens (e.g. keywords, operators, literals)
  3. General tokens (e.g. identifier)

Pay attention to rule naming

You can't use names reserved to a language you use ANTLRv4 with. You defined while grammar rule which will raise conflict with while keyword in Java.

Readability and simplicity is the key

Use pleasent to eye and simpler ANTLRv4 constructs:

  • Change T_WS : (('\t'|'\r'|' ')+) ->skip; to T_WS : [ \t\r]+ -> skip;
  • Change T_ID : [a-zA-Z] ([a-zA-Z]|('0'..'9'))*; to T_ID : [a-zA-Z] [a-zA-Z0-9]*;
  • Change T_COMMULLINE:'%%%' (.|('\t'|'\r'|' '|'\n'))*? '%%%' ->skip; to T_COMMULLINE:'%%%' .*? '%%%' -> skip; (the dot . will match everything anyway, especially whitespace characters)

Upvotes: 1

Related Questions