Dgia
Dgia

Reputation: 5

OCaml guards syntax after a value

I can't quite understand the syntax used here:

let rec lex = parser
  (* Skip any whitespace. *)
  | [< ' (' ' | '\n' | '\r' | '\t'); stream >] -> lex stream

Firstly, I don't understand what it means to use a guard (vertical line) followed by parser. And secondly, I can't seem to find the relevant syntax for the condition surrounded by [< and >]

Got the code from here. Thanks in advance!

Upvotes: 0

Views: 117

Answers (2)

MdeLv
MdeLv

Reputation: 11

| 

means: "or" (does the stream matches this char or this char or ... ?)

| [< ' (' ' | '\n' | '\r' | '\t'); stream >] -> lex stream

means:

  • IF the stream (one char, in this clause, but it can be a sequence of several chars) matches "space" or "new line" or "carriage return" or "tabulation".
  • THEN consume the ("white") matching character and call lex with the rest of the stream.
  • ELSE use the next clause (in your example: "filtering A to Z and a to z chars" for identifiers). As the matched character has been consumed by this clause,

(btw, adding '\n\r', which is "newline + carriage return" would be better to address this historical case; you can do it as an exercise).

To be able to parse streams in OCaml with this syntax, you need the modules from OCaml stdlib (at least Stream and Buffer) and you need the camlp4 or camlp5 syntax extension system that knows the meaning of the keywords parser, [<', etc. In your toplevel, you can do as follows:

#use "topfind";; (* useless if already in your ~/.ocamlinit file *)
#camlp4o;;       (* Topfind directive to load camlp4o in the Toplevel *)


# let st = Stream.of_string "OCaml"
val st : char Stream.t = <abstr>

# Stream.next st
- : char = 'O'

# Stream.next flux_car
- : char = 'C'
(* btw, Exception: Stdlib.Stream.Failure must be handled(empty stream) *)


# let rec lex = parser
            | [< ' (' ' | '\n' | '\r' | '\t'); stream >] -> lex stream
            | [< >] -> [< >]
            (* just the beginning of the parser definition *)

# val lex : char Stream.t -> 'a = <fun>

Now you are up and running to deal with streams and LL(1) stream parsers. The exammple you mentioned works well. If you play within the Toplevel, you can evaluate the token.ml and lexer.ml file with the #use directive to respect the module names (#use "token.ml"). Or you can directly evaluate the expressions of lexer.ml if you nest the type token in a module Token.

# let rec lex = parser (* complete definition *)
val lex : char Stream.t -> Token.token Stream.t = <fun>
val lex_number : Buffer.t -> char Stream.t -> Token.token Stream.t = <fun>
val lex_ident : Buffer.t -> char Stream.t -> Token.token Stream.t = <fun>
val lex_comment : char Stream.t -> Token.token Stream.t = <fun>

# let pgm =
  "def fib(x) \
     if x < 3 then \
    1 \
  else \
    fib(x-1)+fib(x-2)";;
val pgm : string = "def fib(x) if x < 3 then 1 else fib(x-1)+fib(x-2)"
# let cs' = lex (Stream.of_string pgm);;
val cs' : Token.token Stream.t = <abstr>
# Stream.next cs';;
- : Token.token = Token.Def
# Stream.next cs';;
- : Token.token = Token.Ident "fib"
# Stream.next cs';;
- : Token.token = Token.Kwd '('
# Stream.next cs';;
- : Token.token = Token.Ident "x"
# Stream.next cs';;
- : Token.token = Token.Kwd ')'

You get the expected stream of type token.

Now a few technical words about camlp4 and camlp5.

It's indeed recommended not to use the so-called "camlp4" that is being deprecated, and instead use "camlp5" which is in fact the "genuine camlp4" (see hereafter). Assuming you want to use a LL(1) parser. For that, you can use the following camlp5 Toplevel directive instead of the camlp4 one:

#require "camlp5";;  (* add the path + loads the module (topfind directive) *)
#load "camlp5o.cma";;(* patch: manually loads camlp50 module, 
                        because #require forgets to do it (why?) 
                        "o" in "camlp5o" stands for "original syntax" *)

let rec lex = parser
            | [< ' (' ' | '\n' | '\r' | '\t'); stream >] -> lex stream
            | [< >] -> [< >]

# val lex : char Stream.t -> 'a = <fun>

More history about camlp4 and camlp5.

Disclaimer : while I try to be as neutral and factual as possible, this too short explanation may reflect also my personal opinion. Of course, discussion is welcome. As an Ocaml beginner, I found camlp4 very attractive and powerful but it was not easy to distinguish what was exactly camlp4 and to find its more recent documentation. In very brief : It's an old and confused story mainly because of the naming of "camlp4". campl4 is a/the historical syntax extension system for OCaml. Someone decided to improve/retrofit camlp4 around 2006, but it seems that some design decisions turned it in something somehow considered by some people as a "beast" (often, less is more). So, it works, but "there is a lot of stuff under the hood" (its signature became very large). His historical author, Daniel de Rauglaudre decided to keep on developing camlp4 his way and renamed it "campl5" to differentiate from what was the "new camlp4" (named camlp4). Even if camlp5 is not largely used, it's still maintained, operational and used, for example, by coq that has recently integrated a part of campl5 instead of being dependent of the whole camlp5 library (which doesn't mean that "coq doesn't use camlp5 anymore", as you could read). ppx has become a mainstream syntax extension technology in the OCaml world (it seems that it's dedicated to make "limited and reliable" OCaml syntax extensions, mainly for small and very useful code generation (helpers functions, etc.); it's a side discussion). It doesn't mean that camlp5 is "deprecated". camlp5 is certainly misunderstood. I had hard time at the beginning, mainly because of its documentation. I wish I could read this post at that time! Anyway, when programming in OCaml, I believe it's a good thing to explore all kinds of technology. It's up to you to make your opinion.

So, the today so-called "camlp4" is in fact the "old campl4" (or the "new camlp4 of the past" ; I know, it's complicated). LALR(1) parsers such as ocamlyacc or menhir are or have been made mainstream. They have a a bottom-up approach (define .mll and .mly, then compile to OCaml code). LL(1) parsers, such as camlp4/camlp5, have a top-down approach, very close to functional style. The best thing is that you compare then by yourself. Implementing a lexer/parser of your language is perfect for that: with ocamllex/menhir and with ocamllex/camlp5, or even with only camlp5 because it's also a lexer (with pros/cons).

I hope you'll enjoy your LLVM tutorial.

All technical and historical complementary comments are very welcome.

Upvotes: 1

Jeffrey Scofield
Jeffrey Scofield

Reputation: 66818

As @glennsl says, this page uses the campl4 preprocessor, which is considered obsolete by many in the OCaml community.

Here is a forum message from August 2019 that describes how to move from camlp4 to the more recent ppx:

The end of campl4

Unfortunately that doesn't really help you learn what that LLVM page is trying to teach you, which has little to do with OCaml it seems.

This is one reason I find the use of syntax extensions to be problematic. They don't have the staying power of the base language.

(On the other hand, OCaml really is a fantastic language for writing compilers and other language tools.)

Upvotes: 0

Related Questions