Edgar Smith
Edgar Smith

Reputation: 183

Stop Parser From Accepting Extra Characters

I have a very simple parser with pretty standard parsing functions used with Haskell. I have the parser recognizing what I want, but the problem is that is accepts any extra input after what I'm looking for has been recognized. The simple example code I'll give should recognize a string "x", then return, Just 'x', or return Nothing for an input string other than "x". Given input "x", it returns Just 'x', as it should, given the string "d", it returns Nothing, as it should, given "dx", it returns Nothing, as it should, but if given "xd", it will return Just 'x', while I want it to return Nothing.

newtype Parser a = P (String -> [(a,String)])

instance Functor Parser where
   fmap g p = P (\inp -> case parse p inp of
                            []        -> []
                            [(v,out)] -> [(g v, out)])

instance Applicative Parser where
   pure v = P (\inp -> [(v,inp)])
   pg <*> px = P (\inp -> case parse pg inp of
                             []        -> []
                             [(g,out)] -> parse (fmap g px) out)

instance Monad Parser where
   p >>= f = P (\inp -> case parse p inp of
                           []        -> []
                           [(v,out)] -> parse (f v) out)

parse (P p) inp = p inp

item :: Parser Char
item = P (\inp -> case inp of
                     []     -> []
                     (x:xs) -> [(x,xs)])

sat :: (Char -> Bool) -> Parser Char
sat p = do x <- item
           if p x then return x else empty

empty = P (\inp -> [])

test = do { x <- sat (\x -> x == 'x'); return x}

run s = case parse test s of
             [(a, _)] -> Just a
             otherwise -> Nothing

given:

run "xd"

returns:

Just 'x'

My most reasonable attempt was to try to recognize anything that's not an alpha-numeric character, but that just seems to make the parser try to parse beyond the string length and it returns Nothing. Is there a standard way of dealing with this?

Upvotes: 2

Views: 319

Answers (1)

Hjulle
Hjulle

Reputation: 2570

The fix is actually really simple. Just check that the remaining string after parsing is empty in the run function. Like this:

run s = case parse test s of
             [(a, "")] -> Just a
             otherwise -> Nothing

Now any spurious characters will cause the function to return Nothing as you want.

An alternative is to split the check into a separate eof parser that succeeds if the string is empty and fails if there are any characters left and add that to the end of your parser.

Upvotes: 6

Related Questions