Anas El Kesri
Anas El Kesri

Reputation: 51

How to define unrecognized rules in Ocamlyacc

I'm working on company projet, where i have to create a compilator for a language using Ocamlyacc and Ocamllex. I want to know if is it possible to define a rule in my Ocamlyacc Parser that can tell me that no rules of my grammar matching the syntax of an input.

I have to insist that i'am a beginner in Ocamllex/Ocamlyacc

Thank you a lot for your help.

Upvotes: 0

Views: 209

Answers (2)

ivg
ivg

Reputation: 35280

If no rule in your grammar matches the input, then Parsing.Parse_error exception is raised. Usually, this is what you want.

There is also a special token called error that allows you to resynchronize your parser state. You can use it in your rules, as it was a real token produced by a lexer, cf., eof token.

Also, I would suggest to use menhir instead of more venerable ocamlyacc. It is easier to use and debug, and it also comes with a good library of predefined grammars.

Upvotes: 3

alifirat
alifirat

Reputation: 2937

When you write a compiler for a language, the first step is to run your lexer and to check if your program is good from a lexical point of view.

See the below example :

{
  open Parser        (* The type token is defined in parser.mli *)
  exception Eof
}
  rule token = parse
            [' ' '\t']     { token lexbuf }     (* skip blanks *)
          | ['\n' ]        { EOL }
          | ['0'-'9']+ as lxm { INT(int_of_string lxm) }
          | '+'            { PLUS }
          | '-'            { MINUS }
          | '*'            { TIMES }
          | '/'            { DIV }
          | '('            { LPAREN }
          | ')'            { RPAREN }
          | eof            { raise Eof }

It's a lexer to recognize some arithmetic expressions.

If your lexer accepts the input then you give the sequence of lexemes to the parser which try to find if a AST can be build with the specified grammar. See :

%token <int> INT
        %token PLUS MINUS TIMES DIV
        %token LPAREN RPAREN
        %token EOL
        %left PLUS MINUS        /* lowest precedence */
        %left TIMES DIV         /* medium precedence */
        %nonassoc UMINUS        /* highest precedence */
        %start main             /* the entry point */
        %type <int> main
        %%
        main:
            expr EOL                { $1 }
        ;
        expr:
            INT                     { $1 }
          | LPAREN expr RPAREN      { $2 }
          | expr PLUS expr          { $1 + $3 }
          | expr MINUS expr         { $1 - $3 }
          | expr TIMES expr         { $1 * $3 }
          | expr DIV expr           { $1 / $3 }
          | MINUS expr %prec UMINUS { - $2 }
        ;

This is a little program to parse arithmetic expression. A program can be rejected at this step because there is no rule of the grammar to apply in order to have an AST at the end. There is no way to define unrecognized rules but you need to write a grammar which define how a program can be accepted or rejected.

let _ =
  try
    let lexbuf = Lexing.from_channel stdin in
    while true do
      let result = Parser.main Lexer.token lexbuf in
      print_int result; print_newline(); flush stdout
    done
  with Lexer.Eof ->
    exit 0

If your compile the lexer, the parser and the last program, you have :

  1. 1 + 2 is accepted because there is no error lexical errors and an AST can be build corresponding to this expression.
  2. 1 ++ 2 is rejected : no lexical errors but there is no rule to build a such AST.

You can found more documentation here : http://caml.inria.fr/pub/docs/manual-ocaml-4.00/manual026.html

Upvotes: 1

Related Questions