Reputation: 799
The Earley
parsing library is great for writing linguistic parsers in Haskell. CFGs can be specified in an intuitive way, and there is excellent support for backtracking and ambiguity. A simple example:
{-# LANGUAGE OverloadedStrings #-}
import Text.Earley
np = rule ("John" <|> "Mary")
vp = rule ("runs" <|> "walks")
sentence = do
subj <- np
pred <- vp
return $ (++) <$> subj <*> pred
sentence
can be used to parse ["John", "runs"]
or ["Mary", "walks"]
, among other inputs.
It would be nice to be able to use Earley
to write parsers for FCFGs, where nonterminals are complexes of a label and a feature bundle, and feature matching can happen via unification (for example, the Earley parser in NLTK parses FCFGs). However, it is not clear how to do this using Earley
, or whether it can even be done. An example of something we might want in something like BNF:
np[sg] ::= "John" | "Mary"
np[?x] ::= det n[?x]
n[pl] ::= "boys" | "girls"
det ::= "the"
vp[sg] ::= "runs" | "walks"
vp[pl] ::= "run" | "walk"
s ::= np[?x] vp[?x]
Under this FCFG, ["John", "runs"]
is an s
(since their number features match, as required by the s
rule), and ["the", "boys", "walks"]
isn't an s
(since ["the", "boys"]
parses to np[pl]
and ["walks"]
parses to vp[sg]
).
One can in general rewrite an FCFG into an equivalent CFG, but this can be highly inconvenient, and result in a blowup of the grammar, especially when we have many possible features ranging over many possible values.
Upvotes: 1
Views: 152
Reputation: 152957
You're not actually doing any particularly interesting unification here, so perhaps it's enough to toss a very simple nondeterminism applicative of your own into the mix. The standard one is []
, but for this case, even Maybe
looks like enough. Like this:
{-# Language OverloadedStrings #-}
{-# Language TypeApplications #-}
import Control.Applicative
import Control.Monad
import Data.Foldable
import Text.Earley
data Feature = SG | PL deriving (Eq, Ord, Read, Show)
(=:=) :: (Feature, a) -> (Feature, b) -> Maybe (a, b)
(fa, a) =:= (fb, b) = (a, b) <$ guard (fa == fb)
data NP = Name String | Determined String String deriving (Eq, Ord, Read, Show)
np :: Grammar r (Prod r e String (Feature, NP))
np = rule . asum $
[ fmap (\name -> (SG, Name name)) ("John" <|> "Mary")
, liftA2 (\det n -> (PL, Determined det n)) "the" ("boys" <|> "girls")
]
vp :: Grammar r (Prod r e String (Feature, String))
vp = rule . asum $
[ (,) SG <$> ("runs" <|> "walks")
, (,) PL <$> ("run" <|> "walk")
]
s :: Grammar r (Prod r e String (Maybe (NP, String)))
s = liftA2 (liftA2 (=:=)) np vp
test :: [String] -> IO ()
test = print . allParses @() (parser s)
Try it out in ghci:
> sequence_ [test (words n ++ [v]) | n <- ["John", "the boys"], v <- ["walks", "walk"]]
([(Just (Name "John","walks"),2)],Report {position = 2, expected = [], unconsumed = []})
([(Nothing,2)],Report {position = 2, expected = [], unconsumed = []})
([(Nothing,3)],Report {position = 3, expected = [], unconsumed = []})
([(Just (Determined "the" "boys","walk"),3)],Report {position = 3, expected = [], unconsumed = []})
So, the result needs a bit of interpretation -- a successful parse of Nothing
really counts as a failed parse -- but perhaps that's not so bad? Not sure. Certainly it's unfortunate that you don't get to reuse Earley
's error-reporting and nondeterminism machinery. Probably to get either thing, you'd have to fork Earley
.
If you need to do real unification you could look into returning a IntBindingT t Identity
instead of a Maybe
, but at least until your features are themselves recursive this is probably enough and much, much simpler.
Upvotes: 2