Piskator
Piskator

Reputation: 647

The do-notation in Haskell, the >> and the >>= operator and their impact on the final Monad

I am trying to understand Monads and have read and played around with concept from LYAH especially the 'walk the line section': http://learnyouahaskell.com/a-fistful-of-monads#walk-the-line in order to understand how the >>-operator and discarding work.

There does still seem to be some things with regards to the >>= vs. >> that I haven't grasped. At least in the code/MVE below I get different results depending on which of the runRegModuleOne to runRegModuleThree that I use. And I am not entirely sure as to why they behave as they do in all cases, as I point out below.

import qualified Data.Set as S

import Control.Monad

type CharSet = S.Set Char

data RE =
    RClass Bool CharSet
  | RSeq [RE]
  | RAlt RE RE

newtype RegModule d a =
  RegModule {runRegModule :: String -> Int -> d -> [(a, Int, d)]}

instance Monad (RegModule d) where
  return a = RegModule (\_s _i d -> return (a, 0, d))
  m >>= f =
    RegModule (\s i d -> do (a, j, d') <- runRegModule m s i d
                            (b, j', d'') <- runRegModule (f a) s (i + j) d'
                            return (b, j + j', d''))

instance Functor (RegModule d) where fmap = liftM
instance Applicative (RegModule d) where pure = return; (<*>) = ap

scanChar :: RegModule d Char
scanChar = RegModule (\s i d ->
  case drop i s of
    (c:cs) -> return (c, 1, d)
    [] -> []
  )

--For getData we simply rreturn the current state without modifying it
getData :: RegModule d d
getData = RegModule (\_s _i d -> [(d, 0, d)])

--The function just returns and empty state and puts that d in data.
putData :: d -> RegModule d ()
putData d = RegModule (\_s _i _d -> [((), 0, d)]) 

--The function returns an eempty list as failure a bit like pfail in readP.
regfail :: RegModule d a
regfail = RegModule (\_s _i d -> []
                )

choice :: [RegModule d a] -> RegModule d a
choice [] = regfail
choice (m:ms) = RegModule (\s i d -> 
                      case (runRegModule m s i d) of
                        []   -> (runRegModule (choice ms) s i d)
                        succ -> succ
                      )


regEX :: RE -> RegModule [String] ()
regEX (RClass b cs) = do
  next <- scanChar   
  if (S.member next cs) && not b
    then do         
        d <- getData
        putData (d ++ [[next]])        
    else regfail

regEX (RAlt r1 r2) =
  do    
    let m1 = regEX r1
        m2 = regEX r2
    choice [m1, m2]

regEX (RSeq []) = regfail
regEX (RSeq (r:rs)) = do
  case length (rs) of
    0 -> regEX r
    _tail ->  do
              regEX r
              regEX (RSeq rs)

runRegModuleOne :: RegModule d a -> String -> Int -> d -> [(a, Int, d)]
runRegModuleOne matcher input startPos state =
  let (result1, pos1, newState1) = head $ runRegModule matcher input startPos state
      (result2, pos2, newState2) = head $ runRegModule matcher input pos1 (newState1)
      (result3, pos3, newState3) = head $ runRegModule matcher input pos2 (newState2)
      (result4, pos4, newState4) = head $ runRegModule matcher input pos3 (newState3)
  in [(result1, pos1, newState1), (result2, pos2, newState2), (result3, pos3, newState3), (result4, pos4, newState4)]

runRegModuleTwo :: RegModule d a -> String -> Int -> d -> [(a, Int, d)]
runRegModuleTwo matcher input startPos state = do
  (result1, pos1, newState1) <- runRegModule matcher input startPos state
  (result2, pos2, newState2) <- runRegModule matcher input pos1 newState1
  (result3, pos3, newState3) <- runRegModule matcher input pos2 newState2
  (result4, pos4, newState4) <- runRegModule matcher input pos3 newState3
  return (result4, pos4, newState4)


runRegModuleThree :: RegModule d a -> String -> Int -> d -> [(a, Int, d)]
runRegModuleThree matcher = runRegModule (matcher >> matcher >> matcher >> matcher)

--helper functions for ease of using ghci

runRegModuleThree_helper = runRegModuleThree (regEX (RSeq [(RClass False (S.singleton 'a')), (RClass False (S.singleton 'b')),(RClass False (S.singleton 'a')), (RClass False (S.singleton 'b'))])) "abab" 0 []

OUTPUT for using RSeq with the three:

ghci> runRegModuleOne (regEX (RSeq [(RClass False (S.singleton 'a')), (RClass False (S.singleton 'b'))])) "abab" 0 []
[((),2,["a","b"]),((),2,["a","b","a","b"]),((),2,["a","b","a","b","a","b"]),((),2,["a","b","a","b","a","b","a","b"])]

ghci> runRegModuleTwo (regEX (RSeq [(RClass False (S.singleton 'a')), (RClass False (S.singleton 'b'))])) "abab" 0 []
[((),2,["a","b","a","b","a","b","a","b"])]

ghci> runRegModuleThree (regEX (RSeq [(RClass False (S.singleton 'a')), (RClass False (S.singleton 'b'))])) "abab" 0 []  
[]

ghci> runRegModuleThree_helper 
[]

Note that "One" and "Two" seems to be successful after a fashion since they are expected to store the letters they match when they find a sequence of letters, it might have been even more correct had they had only 4 letters saved rather than 8, alghough I am unsure of that part, but suspect it is because 4 runs are being made of the RSeq. Further more "One" saves the history of all the runs.

Why does runRegModuleThree seem to fail though, even when I leave out the regEX (RSeq []) = regfail ?

In comparison runRegModuleThree seems to be the only correct version when using RAlt:

ghci> runRegModuleOne (regEX (RAlt (RClass False (S.singleton 'a')) (RClass False (S.singleton 'b')))) "abab" 0 []
[((),1,["a"]),((),1,["a","b"]),((),1,["a","b","b"]),((),1,["a","b","b","b"])]

ghci> runRegModuleTwo (regEX (RAlt (RClass False (S.singleton 'a')) (RClass False (S.singleton 'b')))) "abab" 0 []    
[((),1,["a","b","b","b"])]

ghci> runRegModuleThree (regEX (RAlt (RClass False (S.singleton 'a')) (RClass False (S.singleton 'b')))) "abab" 0 []  
[((),4,["a","b","a","b"])]

--Other parcularities for runRegModuleOne and Two

ghci> runRegModuleOne (regEX (RAlt (RClass False (S.singleton 'a')) (RClass False (S.singleton 'b')))) "abaa" 0 []   
[((),1,["a"]),((),1,["a","b"]),((),1,["a","b","b"]),((),1,["a","b","b","b"])]

ghci> runRegModuleTwo (regEX (RAlt (RClass False (S.singleton 'a')) (RClass False (S.singleton 'b')))) "abaa" 0 []     
[((),1,["a","b","b","b"])]

Here it seems that "One" and "Two" stores a 'wrong' letter after t he first, whereas this isn't the case for "Three". For "One and "Two" this is also the case for the input "abaa".

But why doesn't this happen for "Three"? As I understand it the 'discard' doesn't discard the values within the Monad but just corresponds to m >> n = m >>= \_ -> n such that no function is applied to the value within n. In this sense the data stored in the monad due to putData and putData (d ++ [[next]]) are not discarded, since it is a side-effect.

How can one make runRegModuleThree work correctly for both cases of the regEx patternmatch depicted here?

Upvotes: 1

Views: 147

Answers (0)

Related Questions