Reputation: 41
I'm studying functional programming using Haskell language. And as an exercise I need to implement a function parsing a primitive arithmetic expression from String
. The function must be able to handle double literals, operations +
, -
, *
, /
with the usual precedence and parentheses.
parseExpr :: String -> Except ParseError Expr
with next defined data types:
data ParseError = ErrorAtPos Natural
deriving Show
newtype Parser a = P (ExceptState ParseError (Natural, String) a)
deriving newtype (Functor, Applicative, Monad)
data Prim a
= Add a a
| Sub a a
| Mul a a
| Div a a
| Abs a
| Sgn a
deriving Show
data Expr
= Val Double
| Op (Prim Expr)
deriving Show
Where ExceptState
is a modified State
monad, allowing to throw exception pointing at the error position.
data Annotated e a = a :# e
deriving Show
infix 0 :#
data Except e a = Error e | Success a
deriving Show
data ExceptState e s a = ES { runES :: s -> Except e (Annotated s a) }
Also ExceptState
has defined Functor
, Applicative
and Monad
instances, which were thoroughly tested earlier, so I am positive in their correctness.
instance Functor (ExceptState e s) where
fmap func ES{runES = runner} = ES{runES = \s ->
case (runner s) of
Error err -> Error err
Success ans -> Success (mapAnnotated func $ ans) }
instance Applicative (ExceptState e s) where
pure arg = ES{runES = \s -> Success (arg :# s)}
p <*> q = Control.Monad.ap p q
instance Monad (ExceptState e s) where
m >>= f = joinExceptState (fmap f m)
where
joinExceptState :: ExceptState e s (ExceptState e s a) -> ExceptState e s a
joinExceptState ES{runES = runner} = ES{runES = \s ->
case (runner s) of
Error err -> Error err
Success (ES{runES = runner2} :# s2) ->
case (runner2 s2) of
Error err -> Error err
Success (res :# s3) -> Success (res :# s3) }
To implement the function parseExpr
I used basic parser combinators:
pChar :: Parser Char
pChar = P $ ES $ \(pos, s) ->
case s of
[] -> Error (ErrorAtPos pos)
(c:cs) -> Success (c :# (pos + 1, cs))
parseError :: Parser a
parseError = P $ ES $ \(pos, _) -> Error (ErrorAtPos pos)
instance Alternative Parser where
empty = parseError
(<|>) (P(ES{runES = runnerP})) (P(ES{runES = runnerQ})) =
P $ ES $ \(pos, s) ->
case runnerP (pos, s) of
Error _ -> runnerQ (pos, s)
Success res -> Success res
instance MonadPlus Parser
which were used to construct more complex ones:
-- | elementary parser not consuming a character, failing if input doesn't
-- reach its end
pEof :: Parser ()
pEof = P $ ES $ \(pos, s) ->
case s of
[] -> Success (() :# (pos, []))
_ -> Error $ ErrorAtPos pos
-- | parses a single digit value
parseVal :: Parser Expr
parseVal = Val <$> (fromIntegral . digitToInt) <$> mfilter isDigit pChar
-- | parses an expression inside parenthises
pParenth :: Parser Expr
pParenth = do
void $ mfilter (== '(') pChar
expr <- parseAddSub
(void $ mfilter (== ')') pChar) <|> parseError
return expr
-- | parses the most prioritised operations
parseTerm :: Parser Expr
parseTerm = pParenth <|> parseVal
parseAddSub :: Parser Expr
parseAddSub = do
x <- parseTerm
ys <- many parseSecond
return $ foldl (\acc (sgn, y) -> Op $
(if sgn == '+' then Add else Sub) acc y) x ys
where
parseSecond :: Parser (Char, Expr)
parseSecond = do
sgn <- mfilter ((flip elem) "+-") pChar
y <- parseTerm <|> parseError
return (sgn, y)
-- | Parses the whole expression. Begins from parsing on +, - level and
-- successfully consuming the whole string.
pExpr :: Parser Expr
pExpr = do
expr <- parseAddSub
pEof
return expr
-- | More convinient way to run 'pExpr' parser
parseExpr :: String -> Except ParseError Expr
parseExpr = runP pExpr
As a result, at this point function works as intended if given String
expression is valid:
ghci> parseExpr "(2+3)-1"
Success (Op (Sub (Op (Add (Val 2.0) (Val 3.0))) (Val 1.0)))
ghci> parseExpr "(2+3-1)-1"
Success (Op (Sub (Op (Sub (Op (Add (Val 2.0) (Val 3.0))) (Val 1.0))) (Val 1.0)))
Otherwise ErrorAtPos
does not point at the necessary position:
ghci> parseExpr "(2+)-1"
Error (ErrorAtPos 1)
ghci> parseExpr "(2+3-)-1"
Error (ErrorAtPos 1)
What am I doing wrong here? Thank you in advance.
My main assumption was that something wrong was with function (<|>)
of Alternative Parser
and it incorrectly changed pos
variable.
(<|>) (P(ES{runES = runnerP})) (P(ES{runES = runnerQ})) =
P $ ES $ \(pos, s) ->
case runnerP (pos, s) of
-- Error _ -> runnerQ (pos, s)
Error (ErrorAtPos pos') -> runnerQ (pos' + pos, s)
Success res -> Success res
But it led to more strange results:
ghci> parseExpr "(5+)-3"
Error (ErrorAtPos 84)
ghci> parseExpr "(5+2-)-3"
Error (ErrorAtPos 372)
Then more doubts were aimed at joinExceptState
function of instance Monad (ExceptState e s)
in spite of everything I've run it through, doubts that it wasn't working on s
of (Natural, String)
type as I indented in this case. But then I can't really change it for this concrete type only.
Upvotes: 4
Views: 196
Reputation: 670
Excellent question, although it would have been even better if it really included all your code. I filled in the missing pieces:
mapAnnotated :: (a -> b) -> Annotated s a -> Annotated s b
mapAnnotated f (a :# e) = (f a) :# e
runP :: Parser a -> String -> Except ParseError a
runP (P (ES {runES = p})) s = case p (0, s) of
Error e -> Error e
Success (a :# e) -> Success a
Why is parseExpr "(5+)-3"
equal to Error (ErrorAtPos 1)
? Here's what happens: we call parseExpr
which (ultimately) calls parseTerm
which is just pParenth <|> parseVal
. pParenth
fails, of course, so we look at the definition of <|>
to work out what to do. That definition says: if the thing on the left fails, try the thing on the right. So we try the thing on the right (i. e. parseVal
), which also fails, and we report the second error, which is in fact at position 1.
To see this more clearly, you can just replace pParenth <|> parseVal
with parseVal <|> pParenth
and observe that you get ErrorAtPos 2
instead.
This is almost certainly not the behaviour you want. The documentation of Megaparsec's p <|> q
, here, says:
If [parser] p fails without consuming any input, parser q is tried.
(emphasis in original, meaning: parser q is not tried in other cases). This is a more useful thing to do. If you got reasonably far trying to parse a parenthesised expression and then got an error, probably you want to report that error rather than complaining that '(' isn't a digit.
Since you say this is an exercise, I'm not going to tell you how to fix the problem. I'll tell you some other stuff, though.
First, this is not your only issue with error reporting. Above we see that parseVal "(1"
reports an error at position 1 (after the problematic character, which is at position 0) whereas pParenth "(5+)-3"
reports an error at position 2 (before the problematic character, which is at position 3). Ideally, both should give the position of the problematic character itself. (Of course, it'd be even better if the parser stated what character it expected, but that's more difficult to do.)
Second, the way I found the problem was to import Debug.Trace
, replace your definition of pChar
with
pChar :: Parser Char
pChar = P $ ES $ \(pos, s) -> traceShow (pos, s) $
case s of
[] -> Error (ErrorAtPos pos)
(c:cs) -> Success (c :# (pos + 1, cs))
and mull over the output for a bit. Debug.Trace is sometimes less useful than one hopes, because of lazy evaluation, but for a program like this it can help a lot.
Third, if you modify your definition of <|>
to match Megaparsec's does, you might need Megaparsec's try
combinator. (Not for the grammar you're trying to parse now, but maybe later.) try
solves the issue that
(singleChar 'p' *> singleChar 'q') <|> (singleChar 'p' *> singleChar 'r')
fails on the string "pr" with Megaparsec's <|>
.
Fourth, you sometimes write someParser <|> parseError
, which I think is equivalent to someParser
for both your definition of <|>
and Megaparsec's.
Fifth, you don't need void
; just ignore the result, it's the same thing.
Sixth, your Except
seems to just be Either
.
Upvotes: 2