Reputation: 35
I have been having some trouble trying to find a good example to go off of for being able to handle strings in ocamllex. I found the desktop calculator example to be somewhat useful but haven't really found a way to implement it in a similar fashion in which it uses strings as well, here is the example I'm referencing:
{
open Parser (* The type token is defined in parser.mli *)
exception Eof
}
rule token = parse
[' ' '\t'] { token lexbuf } (* skip blanks *)
| ['\n' ] { EOL }
| ['0'-'9']+ as lxm { INT(int_of_string lxm) }
| '+' { PLUS }
| '-' { MINUS }
| '*' { TIMES }
| '/' { DIV }
| '(' { LPAREN }
| ')' { RPAREN }
| eof { raise Eof }
Any help would be greatly appreciated.
Upvotes: 2
Views: 1402
Reputation: 66808
I assume you're talking about double-quoted strings as in OCaml. The difficulty in lexing strings is that they require some escape mechanism to allow representing quotes (and the escape mechanism itself, usually).
Here is a simplified version of the code for strings from the OCaml lexer itself:
let string_buff = Buffer.create 256
let char_for_backslash = function
| 'n' -> '\010'
| 'r' -> '\013'
| 'b' -> '\008'
| 't' -> '\009'
| c -> c
. . .
let backslash_escapes =
['\\' '\'' '"' 'n' 't' 'b' 'r' ' ']
. . .
rule main = parse
. . .
| '"'
{ Buffer.clear string_buff;
string lexbuf;
STRING (Buffer.contents string_buff) }
. . .
and string = parse
| '"'
{ () }
| '\\' (backslash_escapes as c)
{ Buffer.add_char string_buff (char_for_backslash c);
string lexbuf }
| _ as c
{ Buffer.add_char string_buff c;
string lexbuf }
Edit: The key feature of this code is that it uses a second scanner (named string
) to do the lexical analysis within a quoted string. This generally keeps things cleaner than trying to write a single scanner for all tokens--some tokens are quite complicated. A similar technique is often used for scanning comments.
Upvotes: 2