Reputation: 1463
I have a logical language defined by the following BNF.
formula ::= true
| false
| var
| formula & formula
| [binder] formula
binder ::= var
| $var
Essentially, this allows for formulae such as x & true
, [x]x
and [$x](x & true)
. The semantics are not important here; but the essential thing is that I have these square bracketed expressions appearing before formulae, and inside those square bracketed expressions, identifiers may or may not be preceded by a dollar sign ($
). Now I have used Haskell's Parsec library to help me construct a parser for this language, detailed below.
module LogicParser where
import System.IO
import Control.Monad
import Text.ParserCombinators.Parsec
import Text.ParserCombinators.Parsec.Expr
import Text.ParserCombinators.Parsec.Language
import qualified Text.ParserCombinators.Parsec.Token as Token
-- Data Structures
data Formula = LVar String
| TT
| FF
| And Formula Formula
| Bound Binder Formula
deriving Show
data Binder = BVar String
| FVar String
deriving Show
-- Language Definition
lang :: LanguageDef st
lang =
emptyDef{ Token.identStart = letter
, Token.identLetter = alphaNum
, Token.reservedOpNames = ["&", "$", "[", "]"]
, Token.reservedNames = ["tt", "ff"]
}
-- Lexer for langauge
lexer =
Token.makeTokenParser lang
-- Trivial Parsers
identifier = Token.identifier lexer
keyword = Token.reserved lexer
op = Token.reservedOp lexer
roundBrackets = Token.parens lexer
whiteSpace = Token.whiteSpace lexer
-- Main Parser, takes care of trailing whitespaces
formulaParser :: Parser Formula
formulaParser = whiteSpace >> formula
-- Parsing Formulas
formula :: Parser Formula
formula = andFormula
<|> formulaTerm
-- Term in a Formula
formulaTerm :: Parser Formula
formulaTerm = roundBrackets formula
<|> ttFormula
<|> ffFormula
<|> lvarFormula
<|> boundFormula
-- Conjunction
andFormula :: Parser Formula
andFormula =
buildExpressionParser [[Infix (op "&" >> return And) AssocLeft]] formulaTerm
-- Bound Formula
boundFormula :: Parser Formula
boundFormula =
do op "["
v <- var
op "]"
f <- formulaTerm
return $ Bound v f
-- Truth
ttFormula :: Parser Formula
ttFormula = keyword "tt" >> return TT
-- Falsehood
ffFormula :: Parser Formula
ffFormula = keyword "ff" >> return FF
-- Logical Variable
lvarFormula :: Parser Formula
lvarFormula =
do v <- identifier
return $ LVar v
-- Variable
var :: Parser Binder
var = try bvar <|> fvar
-- Bound Variable
bvar :: Parser Binder
bvar =
do op "$"
v <- identifier
return $ BVar v
-- Free Variable
fvar :: Parser Binder
fvar =
do v <- identifier
return $ FVar v
-- For testing
main :: IO ()
main = interact (unlines . (map stringParser) . lines)
stringParser :: String -> String
stringParser s =
case ret of
Left e -> "Error: " ++ (show e)
Right n -> "Interpreted as: " ++ (show n)
where
ret = parse formulaParser "" s
My issue is the following. When the dollar sign operator ($
) touches the square bracket, I get an error, whereas if I add a space, the parser works fine:
How can I get the parser to recognise [$x](x & true)
? Note that it has no issue with &
touching its operands, only when the two operators [
and $
touch.
Upvotes: 1
Views: 799
Reputation: 16145
Here's how I would write your parser with Megaparsec (docs, tutorial):
import Text.Megaparsec
import qualified Text.Megaparsec.Char as C
import qualified Text.Megaparsec.Char.Lexer as L
import Control.Monad.Combinators.Expr
import Data.Void
type Parser = Parsec Void String
data Formula = LVar String
| TT
| FF
| Not Formula -- Added to demonstrate `Prefix` of `makeExprParser`
| And Formula Formula
| Bound Binder Formula
deriving Show
data Binder = BVar String
| FVar String
deriving Show
Megaparsec also has makeExprParser
, but it's moved into a separate parser-combinators
package:
formula :: Parser Formula
formula = makeExprParser term operators
where
term = choice
[ TT <$ symbol "true"
, FF <$ symbol "false"
, LVar <$> var
, Bound <$> brackets binder <*> parens formula
]
binder = choice
[ BVar <$> (C.char '$' >> var)
, FVar <$> var
]
var = lexeme $ some C.letterChar
operators :: [[Operator Parser Formula]]
operators =
[ [ Prefix (Not <$ symbol "¬") ]
, [ InfixL (And <$ symbol "&") ]
]
Some points:
<$>
, <$
, $>
) when possible.many1
to some
. There's a tutorial, Switch from Parsec to Megaparsec that aids the transition. (It might be a little outdated; I sent a PR in the process of answering this question.)try
when BVar
s and FVar
s are mutually exclusive on the first character, $
. Simply listing the BVar
parser first is enough. Since you won't find any other valid expression that begins with $
, a failed parser that consumed a $
is not a problem.Your grammar doesn't say anything about literal parentheses or literal parentheses after brackets. So in order to parse "[$x](x & true)"
you need to add explicit parentheses to the grammar, either as
formula ::= ... | '(' formula ')'
or as
formula ::= ... | '[' binder ']' '(' formula ')'
if they're only allowed there. I've gone with the latter, but this is probably wrong.
Continuing,
lexeme :: Parser a -> Parser a
lexeme = L.lexeme spaceConsumer
symbol :: String -> Parser String
symbol = L.symbol spaceConsumer
spaceConsumer :: Parser ()
spaceConsumer = L.space C.space1 empty empty
brackets, parens :: Parser a -> Parser a
brackets = between (symbol "[") (symbol "]")
parens = between (symbol "(") (symbol ")")
Some last points,
between
to wrap a parser in brackets
or braces
.empty :: Alternative f => f a
.Whitespace in parser combinators is tricky. Make sure that all parsers are either lexeme parsers (e.g. symbol "foo"
, lexeme $ some C.letterChar
) or combinations of lexeme parsers. If you use lexeme
on something that already is a lexeme parser, you're doing something wrong, and if you forget to use lexeme
on something, you'll risk disallowing whitespace in places where you want to allow it.
I've not used lexeme
around C.char '$'
.
There are always corner cases, e.g. whitespace at the beginning:
> parseTest formula " [$x](x & true) "
1:1:
|
1 | [$x](x & true)
| ^^^^^
unexpected " [$x]"
expecting "false", "true", '[', '¬', or letter
If you want to thoroughly assert that your parser allows whitespace in all the right places, you can build an "ugly-printer" that, for arbitrary syntax trees, inserts arbitrary whitespace where allowed. Your property is then that parsing an ugly-printed syntax tree should yield the same syntax tree.
Megaparsec parse errors look super nice.
They can look nicer if you use the <?>
operator (also available in Parsec).
Upvotes: 1
Reputation: 1382
I think it doesn't like you having the square brackets as operators. I would try this:
"["
and "]"
from Token.reservedOpNames
squareBrackets = Token.brackets lexer
change your boundFormula
function to:
boundFormula =
do v <- squareBrackets var
f <- formulaTerm
return $ Bound v f
Upvotes: 1
Reputation: 92117
I am not very familiar with the token side of Parsec, but from its documentation I think you need to provide opLetter
and possibly opStart
, since you are providing reservedOp
:
opLetter :: ParsecT s u m Char
This parser should accept any legal tail characters of operators. Note that this parser should even be defined if the language doesn't support user-defined operators, or otherwise the reservedOp parser won't work correctly.
In particular the default opLetter
doesn't include [
or ]
, so it is behaving erratically when you think one of those should be an op.
Upvotes: 1