fourglobe
fourglobe

Reputation: 31

Continuos active pattern match with list for F#

I have this pattern for a compiler, and in the expr parser I want to do this

let (|InfixOperator|_|) tokens = ...

let (|UnaryValue|_|) tokens = ...

let infixExpr tokens =
  match tokens with
  | Value v::Operator op::tokens -> ...
  | Value v::tokens -> ...

The problem is that either the tokens for the match expr have a signature of Token list list or the pattern Value (needs to receive a list since it parses unary operations) does not give back the remaining tokens, the same happens for the Operator pattern

The ugly way around it would be something like this

let (|InfixOperator|_|) tokens = ...

let (|UnaryValue|_|) tokens = ...

let infixExpr tokens =
  match tokens with
  | Value (v, Operator (op, tokens)) -> ...
  | Value (v, tokens) -> ...

Does anyone know any cleaner way of doing this with pattern matching?

Upvotes: 0

Views: 131

Answers (1)

Tomas Petricek
Tomas Petricek

Reputation: 243061

I wrote a parser for a subset of BASIC that uses pattern matching (source code is available on GitHub). This is not heavily using active patterns, but it works in two steps - first it tokenizes the input (turning list<char> into list<Token>) and then it parses the list of tokens (turning list<Token> into list<Statement>).

This way, you mostly avoid the issue with nesting, because most interesting things become just a single token. For example, for operators, the tokenization looks like:

let rec tokenize toks = function
  (* First character is number, collect all remaining number characters *)
  | c::cs when isNumber c -> number toks [c] cs
  (* First character is operator, collect all remanining operator characters *)
  | c::cs when isOp c -> operator toks [c] cs
  (* more cases omitted *)
  | [] -> List.rev toks

and number toks acc = function
  | c::cs when isNumber c -> number toks (c::acc) cs
  | input -> tokenize (Number(float (str acc))::toks) input

and operator toks acc = function
  | c::cs when isOp c -> operator toks (c::acc) cs
  | input -> tokenize (Operator(List.rev acc)::toks) input

If you now have a list of tokens, then parsing a binary expression becomes:

let rec parseBinary left = function
  | (Operator o)::toks -> 
      let right, toks = parseExpr toks
      Binary(o, left, right), toks
  | toks -> left, toks

and parseExpr = function
  | (Number n)::toks -> parseBinary (Const(NumberValue n)) toks
  (* more cases omitted *)

There are some places where I used active patterns, for example to parse a BASIC range expression which can be N1-N2 or -N1 or N1-

let (|Range|_|) = function
  | (Number lo)::(Operator ['-'])::(Number hi)::[] -> 
      Some(Some (int lo), Some (int hi))
  | (Operator ['-'])::(Number hi)::[] -> Some(None, Some(int hi))
  | (Number lo)::(Operator ['-'])::[] -> Some(Some(int lo), None)
  | [] -> Some(None, None)
  | _ -> None

So, I guess the answer is that if you want to write a simple parser using pattern matching (without getting into more sophisticated and more powerful monadic parser libraries), then it is quite doable, but it is good idea to separate tokenization from parsing.

Upvotes: 2

Related Questions