SoftTimur
SoftTimur

Reputation: 5540

Translate one term differently in one program

I try to make a frontend for a kind of programs... there are 2 particularities:

1) When we meet a string beginning with =, I want to read the rest of the string as a formula instead of a string value. For instance, "123", "TRUE", "TRUE+123" are considered having string as type, while "=123", "=TRUE", "=TRUE+123" are considered having Syntax.formula as type. By the way,

(* in syntax.ml *)
and expression =
  | E_formula of formula
  | E_string of string
  ...

and formula =
  | F_int of int  
  | F_bool of bool
  | F_Plus of formula * formula
  | F_RC of rc

and rc =
  | RC of int * int

2) Inside the formula, some strings are interpreted differently from outside. For instance, in a command R4C5 := 4, R4C5 which is actually a variable, is considered as a identifier, while in "=123+R4C5" which tries to be translated to a formula, R4C5 is translated as RC (4,5): rc.

So I don't know how to realize this with 1 or 2 lexers, and 1 or 2 parsers.

At the moment, I try to realize all in 1 lexer and 1 parser. Here is part of code, which doesn't work, it still considers R4C5 as identifier, instead of rc:

(* in lexer.mll *)
let begin_formula =  double_quote "=" 
let end_formula = double_quote
let STRING = double_quote ([^ "=" ])* double_quote

rule token = parse
  ...
  | begin_formula { BEGIN_FORMULA }
  | 'R'      { R }
  | 'C'      { C }
  | end_formula { END_FORMULA }
  | lex_identifier as li
      { try Hashtbl.find keyword_table (lowercase li)  
        with Not_found -> IDENTIFIER li }
  | STRING as s { STRING s }
  ...

(* in parser.mly *)
expression:
| BEGIN_FORMULA f = formula END_FORMULA { E_formula f }
| s = STRING { E_string s }
...

formula:
| i = INTEGER { F_int i }
| b = BOOL { F_bool b }
| f0 = formula PLUS f1 = formula { F_Plus (f0, f1) }  
| rc { F_RC $1 }

rc:
| R i0 = INTEGER C i1 = INTEGER { RC (i0, i1) }

Could anyone help?

New idea: I am thinking of sticking on 1 lexer + 1 parser, and create a entrypoint for formula in lexer as what we do normally for comment... here are some updates in lexer.mll and parser.mly:

(* in lexer.mll *)
rule token = parse
...
| begin_formula { formula lexbuf }
...
| INTEGER as i { INTEGER (int_of_string i)  }
| '+'      { PLUS }
...

and formula = parse
| end_formula { token lexbuf }
| INTEGER as i { INTEGER_F (int_of_string i)  }
| 'R'      { R }
| 'C'      { C }
| '+'      { PLUS_F }
| _        { raise (Lexing_error ("unknown in formula")) }

(* in parser.mly *)
expression:
| formula { E_formula f }
...

formula:
| i = INTEGER_F { F_int i }
| f0 = formula PLUS_F f1 = formula { F_Plus (f0, f1) }  
...

I have done some tests, for instance to parse "=R4", the problem is that it can parse well R, but it considers 4 as INTEGER instead of INTEGER_F, it seems that formula lexbuf needs to be added from time to time in the body of formula entrypoint (Though I don't understand why parsing in the body of token entrypoint works without always mentioning token lexbuf). I have tried several possibilities: | 'R' { R; formula lexbuf }, | 'R' { formula lexbuf; R }, etc. but it didn't work... ... Could anyone help?

Upvotes: 2

Views: 90

Answers (1)

gasche
gasche

Reputation: 31469

I think the simplest choice would be to have two different lexers and two different parsers; call the lexer&parser for formulas from inside the global parser. After the fact you can see how much is shared between the two grammars, and factorize things when possible.

Upvotes: 1

Related Questions