Translate one term differently in one program

Question

I try to make a frontend for a kind of programs... there are 2 particularities:

1) When we meet a string beginning with =, I want to read the rest of the string as a formula instead of a string value. For instance, "123", "TRUE", "TRUE+123" are considered having string as type, while "=123", "=TRUE", "=TRUE+123" are considered having Syntax.formula as type. By the way,

(* in syntax.ml *)
and expression =
  | E_formula of formula
  | E_string of string
  ...

and formula =
  | F_int of int  
  | F_bool of bool
  | F_Plus of formula * formula
  | F_RC of rc

and rc =
  | RC of int * int

2) Inside the formula, some strings are interpreted differently from outside. For instance, in a command R4C5 := 4, R4C5 which is actually a variable, is considered as a identifier, while in "=123+R4C5" which tries to be translated to a formula, R4C5 is translated as RC (4,5): rc.

So I don't know how to realize this with 1 or 2 lexers, and 1 or 2 parsers.

At the moment, I try to realize all in 1 lexer and 1 parser. Here is part of code, which doesn't work, it still considers R4C5 as identifier, instead of rc:

(* in lexer.mll *)
let begin_formula =  double_quote "=" 
let end_formula = double_quote
let STRING = double_quote ([^ "=" ])* double_quote

rule token = parse
  ...
  | begin_formula { BEGIN_FORMULA }
  | 'R'      { R }
  | 'C'      { C }
  | end_formula { END_FORMULA }
  | lex_identifier as li
      { try Hashtbl.find keyword_table (lowercase li)  
        with Not_found -> IDENTIFIER li }
  | STRING as s { STRING s }
  ...

(* in parser.mly *)
expression:
| BEGIN_FORMULA f = formula END_FORMULA { E_formula f }
| s = STRING { E_string s }
...

formula:
| i = INTEGER { F_int i }
| b = BOOL { F_bool b }
| f0 = formula PLUS f1 = formula { F_Plus (f0, f1) }  
| rc { F_RC $1 }

rc:
| R i0 = INTEGER C i1 = INTEGER { RC (i0, i1) }

Could anyone help?

New idea: I am thinking of sticking on 1 lexer + 1 parser, and create a entrypoint for formula in lexer as what we do normally for comment... here are some updates in lexer.mll and parser.mly:

(* in lexer.mll *)
rule token = parse
...
| begin_formula { formula lexbuf }
...
| INTEGER as i { INTEGER (int_of_string i)  }
| '+'      { PLUS }
...

and formula = parse
| end_formula { token lexbuf }
| INTEGER as i { INTEGER_F (int_of_string i)  }
| 'R'      { R }
| 'C'      { C }
| '+'      { PLUS_F }
| _        { raise (Lexing_error ("unknown in formula")) }

(* in parser.mly *)
expression:
| formula { E_formula f }
...

formula:
| i = INTEGER_F { F_int i }
| f0 = formula PLUS_F f1 = formula { F_Plus (f0, f1) }  
...

I have done some tests, for instance to parse "=R4", the problem is that it can parse well R, but it considers 4 as INTEGER instead of INTEGER_F, it seems that formula lexbuf needs to be added from time to time in the body of formula entrypoint (Though I don't understand why parsing in the body of token entrypoint works without always mentioning token lexbuf). I have tried several possibilities: | 'R' { R; formula lexbuf }, | 'R' { formula lexbuf; R }, etc. but it didn't work... ... Could anyone help?

gasche · Accepted Answer

I think the simplest choice would be to have two different lexers and two different parsers; call the lexer&parser for formulas from inside the global parser. After the fact you can see how much is shared between the two grammars, and factorize things when possible.

Translate one term differently in one program

Answers (1)

Related Questions