Parse string with lex in Haskell

Question

I'm following Gentle introduction to Haskell tutorial and the code presented there seems to be broken. I need to understand whether it is so, or my seeing of the concept is wrong.

I am implementing parser for custom type:

data Tree a = Leaf a | Branch (Tree a) (Tree a)

printing function for convenience

showsTree              :: Show a => Tree a -> String -> String
showsTree (Leaf x)     = shows x
showsTree (Branch l r) = ('<':) . showsTree l . ('|':) . showsTree r . ('>':)

instance Show a => Show (Tree a) where 
    showsPrec _ x = showsTree x

this parser is fine but breaks when there are spaces

readsTree         :: (Read a) => String -> [(Tree a, String)]
readsTree ('<':s) =  [(Branch l r, u) | (l, '|':t) <- readsTree s,
                                        (r, '>':u) <- readsTree t ]
readsTree s       =  [(Leaf x, t)     | (x,t)      <- reads s]

this one is said to be a better solution, but it does not work without spaces

readsTree_lex    :: (Read a) => String -> [(Tree a, String)]
readsTree_lex s  = [(Branch l r, x) | ("<", t) <- lex s,
                                   (l, u)   <- readsTree_lex t,
                                   ("|", v) <- lex u,
                                   (r, w)   <- readsTree_lex v,
                                   (">", x) <- lex w ]
                ++
                [(Leaf x, t)     | (x, t)   <- reads s ]

next I pick one of parsers to use with read

instance Read a => Read (Tree a) where
    readsPrec _ s = readsTree s

then I load it in ghci using Leksah debug mode (this is unrelevant, I guess), and try to parse two strings:

    read "<1|<2|3>>"   :: Tree Int -- succeeds with readsTree
    read "<1| <2|3> >" :: Tree Int -- succeeds with readsTree_lex

when lex encounters |<2... part of the former string, it splits onto ("|<", _). That does not match ("|", v) <- lex u part of parser and fails to complete parsing.

There are two questions arising:

how do I define parser that really ignores spaces, not requires them?
how can I define rules for splitting encountered literals with lex

speaking of second question -- it is asked more of curiousity as defining my own lexer seems to be more correct than defining rules of existing one.

not my job · Accepted Answer

lex splits into Haskell lexemes, skipping whitespace.

This means that since Haskell permits |< as a lexeme, lex will not split it into two lexemes, since that's not how it parses in Haskell.

You can only use lex in your parser if you're using the same (or similar) syntactic rules to Haskell.

If you want to ignore all whitespace (as opposed to making any whitespace equivalent to one space), it's much simpler and more efficient to first run filter (not.isSpace).

Parse string with lex in Haskell

Answers (2)

Related Questions