Reputation: 1479
I was reading this Monadic Parsing article while I was trying to implement a pretty simple string parser in Haskell and also get a better understanding of using monads. Down below you can see my code, implementing functions for matching a single character or a whole string. It works as expected, but I observed two strange behaviors that I can't explain.
I have to handle single characters in string
, otherwise, the parser will return only empty lists. To be exact, if I remove this line string [c] = do char c; return [c]
it won't work anymore. I was expecting that string (c:s)
would handle string (c:[])
properly. What could be the cause here?
In my opinion, string
definition should be equivalent to string s = mapM char s
as it would create a list of [Parser Char]
for each character in s
and collect the results as Parser [Char]
. If I use the definition based on mapM
, the program would get stuck in an infinite loop and won't print anything. Is something about lazy evalutation that I miss here?
.
module Main where
newtype Parser a = Parser { apply :: String->[(a, String)] }
instance Monad Parser where
return a = Parser $ \s -> [(a, s)]
ma >>= k = Parser $ \s -> concat [apply (k a) s' | (a, s') <- apply ma s]
instance Applicative Parser where
pure = return
mf <*> ma = do { f <- mf; f <$> ma; }
instance Functor Parser where
fmap f ma = f <$> ma
empty :: Parser a
empty = Parser $ const []
anychar :: Parser Char
anychar = Parser f where
f [] = []
f (c:s) = [(c, s)]
satisfy :: (Char -> Bool) -> Parser Char
satisfy prop = do
c <- anychar
if prop c then return c
else empty
char :: Char -> Parser Char
char c = satisfy (== c)
string :: String -> Parser String
string [] = empty
string [c] = do char c; return [c] --- if I remove this line, all results will be []
string (c:s) = do char c; string s; return (c:s)
main = do
let s = "12345"
print $ apply (string "123") s
print $ apply (string "12") s
print $ apply (string "1") s
print $ apply (string []) s
PS. I think the title of the question is not suggestive enough, please propose an edit if you have a better idea.
Upvotes: 2
Views: 166
Reputation: 48662
string [] = empty
instead of string [] = return []
, you can't use it as a base case for recursion that builds up a list.fmap f ma = f <$> ma
is wrong, since <$>
is defined in terms of fmap
. If you want to define fmap
in terms of your other instances, then do fmap = liftA
or fmap = liftM
. Since mapM
uses fmap
internally but your original string
didn't, this problem didn't come up in your first simple test.Upvotes: 2
Reputation: 116174
string [] = empty
means: "If you need to parse an empty string, fail -- it can not be parsed at all, no matter what's the input string".
By comparison,
string [] = return ""
means: "If you need to parse an empty string, succeed and return the empty string -- it can always be parsed, no matter what's the input string".
By using the first equation, when you recurse in the case string (c:cs)
you need to stop at one character (string [c]
) since reaching zero characters will run empty
and make the whole parser fail.
Hence, you need to either use that string [c] = return [c]
equation, or modify the base "empty string" case so that it succeeds. Arguably, the latter would be more natural.
Upvotes: 2