Reputation: 78262
I've been working with FParsec lately and I found that the lack of generic parsers is a major stopping point for me. My goal for this little library is simplicity as well as support for generic input. Can you think of any additions that would improve this or is anything particularly bad?
open LazyList type State<'a, 'b> (input:LazyList<'a>, data:'b) = member this.Input = input member this.Data = data type Result<'a, 'b, 'c> = | Success of 'c * State<'a, 'b> | Failure of string * State<'a, 'b> type Parser<'a,'b, 'c> = State<'a, 'b> -> Result<'a, 'b, 'c> let (>>=) left right state = match left state with | Success (result, state) -> (right result) state | Failure (message, _) -> Result<'a, 'b, 'd>.Failure (message, state) let (<|>) left right state = match left state with | Success (_, _) as result -> result | Failure (_, _) -> right state let (|>>) parser transform state = match parser state with | Success (result, state) -> Success (transform result, state) | Failure (message, _) -> Failure (message, state) let (<?>) parser errorMessage state = match parser state with | Success (_, _) as result -> result | Failure (_, _) -> Failure (errorMessage, state) type ParseMonad() = member this.Bind (f, g) = f >>= g member this.Return x s = Success(x, s) member this.Zero () s = Failure("", s) member this.Delay (f:unit -> Parser<_,_,_>) = f() let parse = ParseMonad()
Backtracking
Surprisingly it didn't take too much code to implement what you describe. It is a bit sloppy but seems to work quite well.
let (>>=) left right state =
seq {
for res in left state do
match res with
| Success(v, s) ->
let v =
right v s
|> List.tryFind (
fun res ->
match res with
| Success (_, _) -> true
| _ -> false
)
match v with
| Some v -> yield v
| None -> ()
} |> Seq.toList
let (<|>) left right state =
left state @ right state
Backtracking Part 2
Switched around the code to use lazy lists and tail-call optimized recursion.
let (>>=) left right state =
let rec readRight lst =
match lst with
| Cons (x, xs) ->
match x with
| Success (r, s) as q -> LazyList.ofList [q]
| Failure (m, s) -> readRight xs
| Nil -> LazyList.empty<Result<'a, 'b, 'd>>
let rec readLeft lst =
match lst with
| Cons (x, xs) ->
match x with
| Success (r, s) ->
match readRight (right r s) with
| Cons (x, xs) ->
match x with
| Success (r, s) as q -> LazyList.ofList [q]
| Failure (m, s) -> readRight xs
| Nil -> readLeft xs
| Failure (m, s) -> readLeft xs
| Nil -> LazyList.empty<Result<'a, 'b, 'd>>
readLeft (left state)
let (<|>) (left:Parser<'a, 'b, 'c>) (right:Parser<'a, 'b, 'c>) state =
LazyList.delayed (fun () -> left state)
|> LazyList.append
<| LazyList.delayed (fun () -> right state)
Upvotes: 2
Views: 394
Reputation: 243051
I think that one important design decision that you'll need to make is whether you want to support backtracking in your parsers or not (I don't remember much about parsing theory, but this probably specifies the types of languages that your parser can handle).
Backtracking. In your implementation, a parser can either fail (the Failure
case) or produce exactly one result (the Success
case). An alternative option is to generate zero or more results (for example, represent results as seq<'c>
). Sorry if this is something you already considered :-), but anyway...
The difference is that your parser always follows the first possible option. For example, if you write something like the following:
let! s1 = (str "ab" <|> str "a")
let! s2 = str "bcd"
Using your implementation, this will fail for input "abcd". It will choose the first branch of the <|>
operator, which will then process first two characters and the next parser in the sequence will fail. An implementation based on sequences would be able to backtrack and follow the second path in <|>
and parse the input.
Combine. Another idea that occurs to me is that you could also add Combine
member to your parser computation builder. This is a bit subtle (because you need to understand how computation expressions are translated), but it can be sometimes useful. If you add:
member x.Combine(a, b) = a <|> b
member x.ReturnFrom(p) = p
You can then write recursive parsers nicely:
let rec many p acc =
parser { let! r = p // Parse 'p' at least once
return! many p (r::acc) // Try parsing 'p' multiple times
return r::acc |> List.rev } // If fails, return the result
Upvotes: 2