Reputation: 5540
I try to make a frontend for a kind of programs... there are 2 particularities:
1) When we meet a string beginning with =
, I want to read the rest of the string as a formula
instead of a string value. For instance, "123"
, "TRUE"
, "TRUE+123"
are considered having string
as type, while "=123"
, "=TRUE"
, "=TRUE+123"
are considered having Syntax.formula
as type. By the way,
(* in syntax.ml *)
and expression =
| E_formula of formula
| E_string of string
...
and formula =
| F_int of int
| F_bool of bool
| F_Plus of formula * formula
| F_RC of rc
and rc =
| RC of int * int
2) Inside the formula, some strings are interpreted differently from outside. For instance, in a command R4C5 := 4
, R4C5
which is actually a variable, is considered as a identifier
, while in "=123+R4C5"
which tries to be translated to a formula, R4C5
is translated as RC (4,5): rc
.
So I don't know how to realize this with 1 or 2 lexers, and 1 or 2 parsers.
At the moment, I try to realize all in 1 lexer and 1 parser. Here is part of code, which doesn't work, it still considers R4C5
as identifier
, instead of rc
:
(* in lexer.mll *)
let begin_formula = double_quote "="
let end_formula = double_quote
let STRING = double_quote ([^ "=" ])* double_quote
rule token = parse
...
| begin_formula { BEGIN_FORMULA }
| 'R' { R }
| 'C' { C }
| end_formula { END_FORMULA }
| lex_identifier as li
{ try Hashtbl.find keyword_table (lowercase li)
with Not_found -> IDENTIFIER li }
| STRING as s { STRING s }
...
(* in parser.mly *)
expression:
| BEGIN_FORMULA f = formula END_FORMULA { E_formula f }
| s = STRING { E_string s }
...
formula:
| i = INTEGER { F_int i }
| b = BOOL { F_bool b }
| f0 = formula PLUS f1 = formula { F_Plus (f0, f1) }
| rc { F_RC $1 }
rc:
| R i0 = INTEGER C i1 = INTEGER { RC (i0, i1) }
Could anyone help?
New idea: I am thinking of sticking on 1 lexer + 1 parser, and create a entrypoint for formula in lexer as what we do normally for comment
... here are some updates in lexer.mll
and parser.mly
:
(* in lexer.mll *)
rule token = parse
...
| begin_formula { formula lexbuf }
...
| INTEGER as i { INTEGER (int_of_string i) }
| '+' { PLUS }
...
and formula = parse
| end_formula { token lexbuf }
| INTEGER as i { INTEGER_F (int_of_string i) }
| 'R' { R }
| 'C' { C }
| '+' { PLUS_F }
| _ { raise (Lexing_error ("unknown in formula")) }
(* in parser.mly *)
expression:
| formula { E_formula f }
...
formula:
| i = INTEGER_F { F_int i }
| f0 = formula PLUS_F f1 = formula { F_Plus (f0, f1) }
...
I have done some tests, for instance to parse "=R4"
, the problem is that it can parse well R
, but it considers 4
as INTEGER
instead of INTEGER_F
, it seems that formula lexbuf
needs to be added from time to time in the body of formula
entrypoint (Though I don't understand why parsing in the body of token
entrypoint works without always mentioning token lexbuf
). I have tried several possibilities: | 'R' { R; formula lexbuf }
, | 'R' { formula lexbuf; R }
, etc. but it didn't work... ... Could anyone help?
Upvotes: 2
Views: 90
Reputation: 31469
I think the simplest choice would be to have two different lexers and two different parsers; call the lexer&parser for formulas from inside the global parser. After the fact you can see how much is shared between the two grammars, and factorize things when possible.
Upvotes: 1