Reputation: 31
I have this pattern for a compiler, and in the expr parser I want to do this
let (|InfixOperator|_|) tokens = ...
let (|UnaryValue|_|) tokens = ...
let infixExpr tokens =
match tokens with
| Value v::Operator op::tokens -> ...
| Value v::tokens -> ...
The problem is that either the tokens for the match expr have a signature of Token list list
or the pattern Value (needs to receive a list since it parses unary operations) does not give back the remaining tokens, the same happens for the Operator pattern
The ugly way around it would be something like this
let (|InfixOperator|_|) tokens = ...
let (|UnaryValue|_|) tokens = ...
let infixExpr tokens =
match tokens with
| Value (v, Operator (op, tokens)) -> ...
| Value (v, tokens) -> ...
Does anyone know any cleaner way of doing this with pattern matching?
Upvotes: 0
Views: 131
Reputation: 243061
I wrote a parser for a subset of BASIC that uses pattern matching (source code is available on GitHub). This is not heavily using active patterns, but it works in two steps - first it tokenizes the input (turning list<char>
into list<Token>
) and then it parses the list of tokens (turning list<Token>
into list<Statement>
).
This way, you mostly avoid the issue with nesting, because most interesting things become just a single token. For example, for operators, the tokenization looks like:
let rec tokenize toks = function
(* First character is number, collect all remaining number characters *)
| c::cs when isNumber c -> number toks [c] cs
(* First character is operator, collect all remanining operator characters *)
| c::cs when isOp c -> operator toks [c] cs
(* more cases omitted *)
| [] -> List.rev toks
and number toks acc = function
| c::cs when isNumber c -> number toks (c::acc) cs
| input -> tokenize (Number(float (str acc))::toks) input
and operator toks acc = function
| c::cs when isOp c -> operator toks (c::acc) cs
| input -> tokenize (Operator(List.rev acc)::toks) input
If you now have a list of tokens, then parsing a binary expression becomes:
let rec parseBinary left = function
| (Operator o)::toks ->
let right, toks = parseExpr toks
Binary(o, left, right), toks
| toks -> left, toks
and parseExpr = function
| (Number n)::toks -> parseBinary (Const(NumberValue n)) toks
(* more cases omitted *)
There are some places where I used active patterns, for example to parse a BASIC range expression which can be N1-N2
or -N1
or N1-
let (|Range|_|) = function
| (Number lo)::(Operator ['-'])::(Number hi)::[] ->
Some(Some (int lo), Some (int hi))
| (Operator ['-'])::(Number hi)::[] -> Some(None, Some(int hi))
| (Number lo)::(Operator ['-'])::[] -> Some(Some(int lo), None)
| [] -> Some(None, None)
| _ -> None
So, I guess the answer is that if you want to write a simple parser using pattern matching (without getting into more sophisticated and more powerful monadic parser libraries), then it is quite doable, but it is good idea to separate tokenization from parsing.
Upvotes: 2