sukhmel
sukhmel

Reputation: 1482

Parse string with lex in Haskell

I'm following Gentle introduction to Haskell tutorial and the code presented there seems to be broken. I need to understand whether it is so, or my seeing of the concept is wrong.

I am implementing parser for custom type:

data Tree a = Leaf a | Branch (Tree a) (Tree a)

printing function for convenience

showsTree              :: Show a => Tree a -> String -> String
showsTree (Leaf x)     = shows x
showsTree (Branch l r) = ('<':) . showsTree l . ('|':) . showsTree r . ('>':)

instance Show a => Show (Tree a) where 
    showsPrec _ x = showsTree x

this parser is fine but breaks when there are spaces

readsTree         :: (Read a) => String -> [(Tree a, String)]
readsTree ('<':s) =  [(Branch l r, u) | (l, '|':t) <- readsTree s,
                                        (r, '>':u) <- readsTree t ]
readsTree s       =  [(Leaf x, t)     | (x,t)      <- reads s]

this one is said to be a better solution, but it does not work without spaces

readsTree_lex    :: (Read a) => String -> [(Tree a, String)]
readsTree_lex s  = [(Branch l r, x) | ("<", t) <- lex s,
                                   (l, u)   <- readsTree_lex t,
                                   ("|", v) <- lex u,
                                   (r, w)   <- readsTree_lex v,
                                   (">", x) <- lex w ]
                ++
                [(Leaf x, t)     | (x, t)   <- reads s ]

next I pick one of parsers to use with read

instance Read a => Read (Tree a) where
    readsPrec _ s = readsTree s

then I load it in ghci using Leksah debug mode (this is unrelevant, I guess), and try to parse two strings:

    read "<1|<2|3>>"   :: Tree Int -- succeeds with readsTree
    read "<1| <2|3> >" :: Tree Int -- succeeds with readsTree_lex

when lex encounters |<2... part of the former string, it splits onto ("|<", _). That does not match ("|", v) <- lex u part of parser and fails to complete parsing.

There are two questions arising:

  1. how do I define parser that really ignores spaces, not requires them?
  2. how can I define rules for splitting encountered literals with lex

speaking of second question -- it is asked more of curiousity as defining my own lexer seems to be more correct than defining rules of existing one.

Upvotes: 4

Views: 1767

Answers (2)

sukhmel
sukhmel

Reputation: 1482

The answer to this seems to be a small gap between text of Gentle introduction to Haskell and its code samples, plus an error in sample code.

there should also be one more lexer, but there is no working example (satisfying my need) in codebase, so I written one. Please point out any flaw in it:

lexAll :: ReadS String
lexAll s = case lex s of
            [("",_)] -> []                                  -- nothing to parse.
            [(c, r)] -> if length c == 1 then [(c, r)]      -- we will try to match
                           else [(c, r), ([head s], tail s)]-- not only as it was 
            any_else -> any_else                            -- parsed but also splitted

author sais:

Finally, the complete reader. This is not sensitive to white space as were the previous versions. When you derive the Show class for a data type the reader generated automatically is similar to this in style.

but lexAll should be used instead of lex (which seems to be said error):

readsTree' :: (Read a) => ReadS (Tree a)
readsTree' s = [(Branch l r, x) | ("<", t) <- lexAll s,
                  (l, u)   <- readsTree' t,
                                  ("|", v) <- lexAll u,
                                  (r, w)   <- readsTree' v,
                  (">", x) <- lexAll w ]
                ++
                [(Leaf x, t)    | (x, t) <- reads s]

Upvotes: 2

not my job
not my job

Reputation: 672

lex splits into Haskell lexemes, skipping whitespace.

This means that since Haskell permits |< as a lexeme, lex will not split it into two lexemes, since that's not how it parses in Haskell.

You can only use lex in your parser if you're using the same (or similar) syntactic rules to Haskell.

If you want to ignore all whitespace (as opposed to making any whitespace equivalent to one space), it's much simpler and more efficient to first run filter (not.isSpace).

Upvotes: 4

Related Questions