Is this a reasonable foundation for a parser combinator library?

Question

I've been working with FParsec lately and I found that the lack of generic parsers is a major stopping point for me. My goal for this little library is simplicity as well as support for generic input. Can you think of any additions that would improve this or is anything particularly bad?

open LazyList 

type State<'a, 'b> (input:LazyList<'a>, data:'b) =
    member this.Input = input
    member this.Data = data

type Result<'a, 'b, 'c> =
| Success of 'c * State<'a, 'b>
| Failure of string * State<'a, 'b>

type Parser<'a,'b, 'c> = State<'a, 'b> -> Result<'a, 'b, 'c>

let (>>=) left right state =
    match left state with
    | Success (result, state) -> (right result) state
    | Failure (message, _) -> Result<'a, 'b, 'd>.Failure (message, state)

let (<|>) left right state =
    match left state with
    | Success (_, _) as result -> result
    | Failure (_, _) -> right state

let (|>>) parser transform state =
    match parser state with
    | Success (result, state) -> Success (transform result, state)
    | Failure (message, _) -> Failure (message, state)

let () parser errorMessage state =
    match parser state with
    | Success (_, _) as result -> result
    | Failure (_, _) -> Failure (errorMessage, state)                     

type ParseMonad() =
    member this.Bind (f, g) = f >>= g
    member this.Return x s = Success(x, s)
    member this.Zero () s = Failure("", s)                           
    member this.Delay (f:unit -> Parser<_,_,_>) = f()

let parse = ParseMonad()

Backtracking

Surprisingly it didn't take too much code to implement what you describe. It is a bit sloppy but seems to work quite well.

let (>>=) left right state =
    seq {
        for res in left state do
            match res with
            | Success(v, s) ->
                let v  = 
                    right v s 
                    |> List.tryFind (
                        fun res -> 
                            match res with 
                            | Success (_, _) -> true 
                            | _ -> false
                    ) 
                match v with
                | Some v -> yield v
                | None -> ()
    } |> Seq.toList

let (<|>) left right state = 
    left state @ right state

Backtracking Part 2

Switched around the code to use lazy lists and tail-call optimized recursion.

let (>>=) left right state =
    let rec readRight lst =
        match lst with
        | Cons (x, xs) ->
            match x with
            | Success (r, s) as q -> LazyList.ofList [q]                     
            | Failure (m, s) -> readRight xs
        | Nil -> LazyList.empty>
    let rec readLeft lst =
        match lst with
        | Cons (x, xs) ->
            match x with
            | Success (r, s) -> 
                match readRight (right r s) with 
                | Cons (x, xs) ->
                    match x with
                    | Success (r, s) as q -> LazyList.ofList [q]                     
                    | Failure (m, s) -> readRight xs
                | Nil -> readLeft xs   
            | Failure (m, s) -> readLeft xs
        | Nil -> LazyList.empty>
    readLeft (left state)

let (<|>) (left:Parser<'a, 'b, 'c>) (right:Parser<'a, 'b, 'c>) state = 
    LazyList.delayed (fun () -> left state) 
    |> LazyList.append 
    <| LazyList.delayed (fun () -> right state)

Tomas Petricek · Accepted Answer

I think that one important design decision that you'll need to make is whether you want to support backtracking in your parsers or not (I don't remember much about parsing theory, but this probably specifies the types of languages that your parser can handle).

Backtracking. In your implementation, a parser can either fail (the Failure case) or produce exactly one result (the Success case). An alternative option is to generate zero or more results (for example, represent results as seq<'c>). Sorry if this is something you already considered :-), but anyway...

The difference is that your parser always follows the first possible option. For example, if you write something like the following:

let! s1 = (str "ab" <|> str "a")
let! s2 = str "bcd"

Using your implementation, this will fail for input "abcd". It will choose the first branch of the <|> operator, which will then process first two characters and the next parser in the sequence will fail. An implementation based on sequences would be able to backtrack and follow the second path in <|> and parse the input.

Combine. Another idea that occurs to me is that you could also add Combine member to your parser computation builder. This is a bit subtle (because you need to understand how computation expressions are translated), but it can be sometimes useful. If you add:

member x.Combine(a, b) = a <|> b
member x.ReturnFrom(p) = p

You can then write recursive parsers nicely:

let rec many p acc = 
  parser { let! r = p                  // Parse 'p' at least once
           return! many p (r::acc)     // Try parsing 'p' multiple times
           return r::acc |> List.rev } // If fails, return the result

Is this a reasonable foundation for a parser combinator library?

Answers (1)

Related Questions