Reputation: 2624
I want to parse a language like this
foo = (bar, bar1 = (bar2 = bar4), bar5)
I wrote a simple parser
module SimpleParser where
import Text.Parsec.String (Parser)
import Text.Parsec.Language (emptyDef)
import Text.Parsec
import qualified Text.Parsec.Token as Tok
import Text.Parsec.Char
import Prelude
lexer :: Tok.TokenParser ()
lexer = Tok.makeTokenParser style
where
style = emptyDef {
Tok.identLetter = alphaNum
}
parens :: Parser a -> Parser a
parens = Tok.parens lexer
commaSep :: Parser a -> Parser [a]
commaSep = Tok.commaSep lexer
identifier :: Parser String
identifier = Tok.identifier lexer
reservedOp :: String -> Parser ()
reservedOp = Tok.reservedOp lexer
data Expr = Ident String | Label String Expr | ExprList [Expr] deriving (Eq, Ord, Show)
parseExpr :: String -> Either ParseError Expr
parseExpr s = parse expr "" s
expr :: Parser Expr
expr = parens expr
<|> try exprList
<|> ident
ident :: Parser Expr
ident = do
var <- identifier
return $ Ident var
exprLabel :: Parser Expr
exprLabel = do
var <- identifier
reservedOp "="
body <- expr
return $ Label var body
exprList :: Parser Expr
exprList = do
list <- commaSep (try exprLabel <|> expr)
return $ ExprList list
But even with the following simple input it has an infinite loop:
test = parseExpr "foo = bar"
Could someone explain why it doesn't work and how I can fix it?
Upvotes: 1
Views: 265
Reputation: 9891
Thing is, in your code, exprList
will loop if it tries to
parse an identifier, that is parse exprList "" "foo"
goes into
an infinite loop. This is because it tries to parse it as a list
of either Labels or expressions, and expressions can be lists. Once
it fails to be a exprLabel
it tries to see if it can be a expr
and so
it calls exprList
again.
To fix it you need to make sure that expr
checks to see both if it is
a exprLabel
or an identifier
before it tries exprList
. Note that if
all the above fails it will still go into a loop. This is because it doesn't know if this is just the start of a list (or a list of lists of lists of lists...) or not.
To fix this you can make expr
only call exprList
if it matches a parens
, and use exprList
as the starting Parser
.
expr :: Parser Expr
expr = parens (exprList)
<|> try exprLabel
<|> ident
exprList :: Parser Expr
exprList = do
list <- commaSep expr
return $ ExprList list
And it works like this:
>parse exprList "" "(foo=bar),foo=bar"
Right (ExprList [ExprList [Label "foo" (Ident "bar")],Label "foo" (Ident "bar")])
Upvotes: 3