Dfr
Dfr

Reputation: 4185

Haskell laziness question or why this monad not working as expected

here is kinda huge piece of code, it finally got huge, because of monadic stuff, but task is simple: parse following string to data structure:

"hello(some, args)" -> [("fid","hello"),("sym","("),("args","some, args"),("sym",")")]

but code i written produces following:

"hello(some, args)" -> [("fid",""),("sym","("),("args",""),("sym",")")]

as you can see 'args' and 'fid' values are lost somewhere on the way, i suspect compiler decided not to calculate them for some mysterious reason.

I guess the code is completely bad, also i marked with "?"'s parts which seem useless to me, but compiler forced me to leave them in place :)

And here is the code:

type PStream = String
type PToken a = (String, a)
data Pstate a = Pstate (String -> ([PToken String], PStream)) a

instance Monad Pstate where
    return x = Pstate (\_ -> ([("start", "")], "?")) x -- not used ?
    (Pstate bindparser v) >>= f  = Pstate newparser fv
        where
            Pstate fparser fv = f v
            (btok, brest) = bindparser "this string also not used"
            (tok, rest) = fparser brest
            newparser _ = (btok ++ tok, rest)

-- parsers
parseFid :: Pstate String
parseFid = Pstate parser "???"
    where parser r = let (fid, rest) = span (/= '(') r in ([("fid", fid)],rest)

parseSym :: Char -> Pstate String
parseSym c = Pstate parser "???"
    where parser r = let rest = parseOne c r in ([("sym", [c])],rest)

parseOne s (h:t) = if h == s then t else error $ "symbol not match:" ++ [h] ++ " /= " ++ [s]
parseOne s []    = []

parseArgs :: Pstate String
parseArgs = Pstate parser "???"
    where parser r = let (args,rest) = span (/=')') r in ([("args", args)],rest)

-- util
load :: String -> Pstate String
load s = Pstate (\ls -> ([("load", "")],ls)) "???"

runP :: Pstate String -> ([PToken String], PStream)
runP (Pstate fparser fv) = fparser "???"

-- combined parser
parseFunction :: String -> Pstate String
parseFunction s = do
    load s --- should be 'return' here ?
    parseFid
    parseSym '('
    parseArgs
    parseSym ')'

main = putStrLn $ show $ runP $ parseFunction "hello(a b c)"

Upvotes: 4

Views: 321

Answers (2)

fuz
fuz

Reputation: 93172

First, about the "???" you had to leave there. Consider your definition of Pstate:

data Pstate a = Pstate (String -> ([PToken String], PStream)) a

This means, that your data constructor has the following type:

Pstate :: (String -> ([PToken String], PStream)) -> a -> Pstate a

This is the default construct of a monad. If you define monadic combinators, it's actually not uncommon to have some combinators where this is not needed, so the convention is to leave it to () in this case.

But actually I think that your code is very strange, it seems like you didn't grabbed the point of a stateful monad. Let me explain:

Usually, a stateful computation has this type:

data MyState a = MyState (TypeOfState -> (a, TypeOfState))

This means, that your monadic action is actually some kind of computation, that does something (possible with your piece of state) and than returns a result and a new state. The state is wrapped up in the monad, so you don't have to think about it.

In your code, you're using the same pattern, but somewhat different. It seems like that you fixed the result of the computation to [PToken String]. Let me fix up your definition a bit:

data Pstate a = Pstate (PStream -> (a, PStream))

So now, you get the return value of your computation by applying the combinators, which look like this:

instance Monad Pstate where
  -- return should just wrap the computation up, so no changes
  return x = Pstate (\p -> (x,p))
  parser1 >>= parser2  = Pstate $ \input -> let
    Pstate parser1' = parser1
    Pstate parser2' = parser2
    (output, rest) = parser1' input
    result = parser2' output rest
    in result

Now, you can look at the type signatures for your parsers, they should be something like this: parseFid :: Pstate [PToken PStream]. This means, your parser consumes some input and returns the parsed stuff as [PToken PStream] and sets the new input to what is left. Consider this definition of parseFid about how it could look like:

parseFid :: Pstate [PToken PStream]
parseFid = Pstate $ \r -> let (fid, rest) = span (/= '(') r in ([("fid", fid)],rest)

The rest is left as an exercise to the reader. I would suggest you to reformulate your parser using the State monad from Control.Monad.State.Strict instead. You'll see, that the monad above is basically the same.


Actually, it's most times easier to rely on existing and well known tools, instead of rolling down an own parser. Here's a parser for what you need created with Parsec, a state of the art library for parsing:

import Text.Parsec

parseFunction = do name   <- parseName
                   obrace <- parseOpeningBrace
                   args   <- parseArguments
                   cbrace <- parseClosingBrace
                   return [name,obrace,args,cbrace]

parseName         = many (noneOf "(") >>= \name -> return ("fid",name)
parseOpeningBrace = char '(' >> return ("sym","(")
parseArguments    = many (noneOf ")") >>= \name -> return ("args",name)
parseClosingBrace = char ')' >> return ("sym",")")

main = case parse parseFunction "" "hello(some, args)" of
  Left error   -> print error
  Right result -> print result

Here's the output:

[("fid","hello"),("sym","("),("args","some, args"),("sym",")")]

I actually would suggest you to think of some better representation of the parsed function, this may make things easier.

Upvotes: 3

sth
sth

Reputation: 229934

If you run the code as posted, you can see that the "this string also not used" string is in fact used, as you get this output:

([("load",""),("fid","this string also not used"),("sym","("),("args","this string also not used"),("sym",")")],"")

In fact the string is basically used as input for all the parsers. In the definition of >>=, the string is given as input to the bindparser. This parser then takes it as it's input and creates tokens from it. parseFid for example produces the token ("fid","this string also not used").

The newparser that is constructed in >>= ignores any input it might receive later, it just returns the result of parsing "this string also not used". Similar, the parser created with return ignores the value it should return.

The parsers created with bind should not ignore/override their inputs for parsing to work correctly.

Also you should decide what role the second parameter of Pstate should fulfill, since at the moment is mostly contains "???", which doesn't look particularly useful.

Upvotes: 2

Related Questions