Joshua Snider
Joshua Snider

Reputation: 795

Print tokenization of a string

I'm currently working on a programming language as a hobby. It would make lexing errors massively easier to debug if it was possible to have ocamllex print out the tokens it matches as it finds them, I occasionally just add print statements to my rules manually but there should be an easier way to do that.

So what I'm asking is, given a .mll file and some input, is there an automatic way to see the corresponding tokens?

Upvotes: 3

Views: 1611

Answers (1)

Jeffrey Scofield
Jeffrey Scofield

Reputation: 66808

I don't think there is a built-in way to ask the lexer to print its tokens.

If you use ocamlyacc, you can set the p option in OCAMLRUNPARAM to see a trace of the parser's actions. This is described in Section 12.5 of the OCaml manual. See Section 10.2 for a description of OCAMLRUNPARAM.

If you don't mind a crude hack, I just wrote a small script lext that adds tracing to the output generated by ocamllex:

#!/bin/sh
#
echo '
    let my_engine a b lexbuf =
        let res = Lexing.engine a b lexbuf in
        Printf.printf "Saw token [%s]'\\\\'n" (Lexing.lexeme lexbuf);
        res
'
sed 's/Lexing\.engine/my_engine/g' "$@"

It works like this:

$ cat ab.mll
rule token = parse
    [' ' '\t'] { token lexbuf }
  | '\n'       { 1 }
  | '+'        { 2 }
  | _          { 3 }
{
    let lexbuf = Lexing.from_channel stdin in
    try
        while true do
            ignore (token lexbuf)
        done
    with _ -> exit 0
}
$ ocamllex ab.mll
5 states, 257 transitions, table size 1058 bytes
$ lext ab.ml > abtraced.ml
$ ocamlopt -o abtraced abtraced.ml
$ echo 'a+b' | abtraced
Saw token []
Saw token [a]
Saw token [+]
Saw token [b]
Saw token [
]
Saw token []

Upvotes: 5

Related Questions