Reputation: 2397
I'm trying to parse (for now) a subset of the Dot language. The grammar is here and my code is the following
import System.Environment
import System.IO
import qualified Text.Parsec.Token as P
import Text.ParserCombinators.Parsec.Char -- for letter
import Text.Parsec
import qualified Control.Applicative as App
import Lib
type Id = String
data Dot = Undirected Id Stmts
| Directed Id Stmts
deriving (Show)
data Stmt = NodeStmt Node | EdgeStmt Edges
deriving (Show)
type Stmts = [Stmt]
data Node = Node Id Attributes deriving (Show)
data Edge = Edge Id Id deriving (Show)
type Edges = [Edge]
data Attribute = Attribute Id Id deriving (Show)
type Attributes = [Attribute]
dotDef :: P.LanguageDef st
dotDef = P.LanguageDef
{ P.commentStart = "/*"
, P.commentEnd = "*/"
, P.commentLine = "//"
, P.nestedComments = True
, P.identStart = letter
, P.identLetter = alphaNum
, P.reservedNames = ["node", "edge", "graph", "digraph", "subgraph", "strict" ]
, P.caseSensitive = True
, P.opStart = oneOf "-="
, P.opLetter = oneOf "->"
, P.reservedOpNames = []
}
lexer = P.makeTokenParser dotDef
brackets = P.brackets lexer
braces = P.braces lexer
identifier = P.identifier lexer
reserved = P.reserved lexer
semi = P.semi lexer
comma = P.comma lexer
reservedOp = P.reservedOp lexer
eq_op = reservedOp "="
undir_edge_op = reservedOp "--"
dir_edge_op = reservedOp "->"
edge_op = undir_edge_op <|> dir_edge_op
-- -> Attribute
attribute = do
id1 <- identifier
eq_op
id2 <- identifier
optional (semi <|> comma)
return $ Attribute id1 id2
a_list = many attribute
bracked_alist =
brackets $ option [] a_list
attributes =
do
nestedAttributes <- many1 bracked_alist
return $ concat nestedAttributes
nodeStmt = do
nodeName <- identifier
attr <- option [] attributes
return $ NodeStmt $ Node nodeName attr
dropLast = reverse . tail . reverse
edgeStmt = do
nodes <- identifier `sepBy1` edge_op
return $ EdgeStmt $ fmap (\x -> Edge (fst x) (snd x)) (zip (dropLast nodes) (tail nodes))
stmt = do
x <- nodeStmt <|> edgeStmt
optional semi
return x
stmt_list = many stmt
graphDecl = do
reserved "graph"
varName <- option "" identifier
stms <- braces stmt_list
return $ Undirected varName stms
digraphDecl = do
reserved "digraph"
varName <- option "" identifier
stms <- braces stmt_list
return $ Directed varName stms
topLevel3 = do
spaces
graphDecl <|> digraphDecl
main :: IO ()
main = do
(file:_) <- getArgs
content <- readFile file
case parse topLevel3 "" content of
Right g -> print g
Left err -> print err
Given this input
digraph PZIFOZBO{
a[toto = bar] b ; c ; w // 1
a->b // 2
}
It works fine if line 1 or line 2 is commented, but if both are enabled, it fails with
(line 3, column 10): unexpected "-" expecting identifier or "}"
My understanding it that the parser picks first matching rule (with backtracking). Here both edge and node statement starts with and identifier, so it always pick this one.
I tried reversing the order in stmt
, without any luck.
I also tried to sprinkle some try
in stmt, nodeStmt and edgeStmt, without luck either.
Any help appreciated.
Upvotes: 2
Views: 271
Reputation: 51129
Note that I get the same error whether or not line 1 is commented out, so:
digraph PZIFOZBO{
a->b
}
also says unexpected "-"
.
As I think you have correctly diagnosed, the problem here is that the stmt
parser tries nodeStmt
first. That succeeds and parses "a"
, leaving "->b"
yet to be consumed, but ->b
isn't a valid statement. Note that Parsec does not backtrack automatically in the absence of a try
, so it's not going to go back and revisit this decisions when it "discovers" that ->b
can't be parsed.
You can "fix" this problem by swapping the order in stmt
:
x <- edgeStmt <|> nodeStmt
but now the parse will break on an expression like a[toto = bar]
. That's because edgeStmt
is buggy. It parses "a"
as a valid statement EdgeStmt []
because sepBy1
allows a single edge "a"
, which isn't what you want.
If you rewrite edgeStmt
to require at least one edge:
import Control.Monad (guard)
edgeStmt = do
nodes <- identifier `sepBy1` edge_op
guard $ length nodes > 1
return $ EdgeStmt $ fmap (\x -> Edge (fst x) (snd x)) (zip (dropLast nodes) (tail nodes))
and adjust stmt
to "try
" an edge statement first and backtrack to a node statement:
stmt = do
x <- try edgeStmt <|> nodeStmt
optional semi
return x
then your example compiles fine.
Upvotes: 2