Reputation: 647
I am trying to understand Monads and have read and played around with concept from LYAH especially the 'walk the line section': http://learnyouahaskell.com/a-fistful-of-monads#walk-the-line in order to understand how the >>
-operator and discarding work.
There does still seem to be some things with regards to the >>=
vs. >>
that I haven't grasped. At least in the code/MVE below I get different results depending on which of the runRegModuleOne
to runRegModuleThree
that I use. And I am not entirely sure as to why they behave as they do in all cases, as I point out below.
import qualified Data.Set as S
import Control.Monad
type CharSet = S.Set Char
data RE =
RClass Bool CharSet
| RSeq [RE]
| RAlt RE RE
newtype RegModule d a =
RegModule {runRegModule :: String -> Int -> d -> [(a, Int, d)]}
instance Monad (RegModule d) where
return a = RegModule (\_s _i d -> return (a, 0, d))
m >>= f =
RegModule (\s i d -> do (a, j, d') <- runRegModule m s i d
(b, j', d'') <- runRegModule (f a) s (i + j) d'
return (b, j + j', d''))
instance Functor (RegModule d) where fmap = liftM
instance Applicative (RegModule d) where pure = return; (<*>) = ap
scanChar :: RegModule d Char
scanChar = RegModule (\s i d ->
case drop i s of
(c:cs) -> return (c, 1, d)
[] -> []
)
--For getData we simply rreturn the current state without modifying it
getData :: RegModule d d
getData = RegModule (\_s _i d -> [(d, 0, d)])
--The function just returns and empty state and puts that d in data.
putData :: d -> RegModule d ()
putData d = RegModule (\_s _i _d -> [((), 0, d)])
--The function returns an eempty list as failure a bit like pfail in readP.
regfail :: RegModule d a
regfail = RegModule (\_s _i d -> []
)
choice :: [RegModule d a] -> RegModule d a
choice [] = regfail
choice (m:ms) = RegModule (\s i d ->
case (runRegModule m s i d) of
[] -> (runRegModule (choice ms) s i d)
succ -> succ
)
regEX :: RE -> RegModule [String] ()
regEX (RClass b cs) = do
next <- scanChar
if (S.member next cs) && not b
then do
d <- getData
putData (d ++ [[next]])
else regfail
regEX (RAlt r1 r2) =
do
let m1 = regEX r1
m2 = regEX r2
choice [m1, m2]
regEX (RSeq []) = regfail
regEX (RSeq (r:rs)) = do
case length (rs) of
0 -> regEX r
_tail -> do
regEX r
regEX (RSeq rs)
runRegModuleOne :: RegModule d a -> String -> Int -> d -> [(a, Int, d)]
runRegModuleOne matcher input startPos state =
let (result1, pos1, newState1) = head $ runRegModule matcher input startPos state
(result2, pos2, newState2) = head $ runRegModule matcher input pos1 (newState1)
(result3, pos3, newState3) = head $ runRegModule matcher input pos2 (newState2)
(result4, pos4, newState4) = head $ runRegModule matcher input pos3 (newState3)
in [(result1, pos1, newState1), (result2, pos2, newState2), (result3, pos3, newState3), (result4, pos4, newState4)]
runRegModuleTwo :: RegModule d a -> String -> Int -> d -> [(a, Int, d)]
runRegModuleTwo matcher input startPos state = do
(result1, pos1, newState1) <- runRegModule matcher input startPos state
(result2, pos2, newState2) <- runRegModule matcher input pos1 newState1
(result3, pos3, newState3) <- runRegModule matcher input pos2 newState2
(result4, pos4, newState4) <- runRegModule matcher input pos3 newState3
return (result4, pos4, newState4)
runRegModuleThree :: RegModule d a -> String -> Int -> d -> [(a, Int, d)]
runRegModuleThree matcher = runRegModule (matcher >> matcher >> matcher >> matcher)
--helper functions for ease of using ghci
runRegModuleThree_helper = runRegModuleThree (regEX (RSeq [(RClass False (S.singleton 'a')), (RClass False (S.singleton 'b')),(RClass False (S.singleton 'a')), (RClass False (S.singleton 'b'))])) "abab" 0 []
OUTPUT for using RSeq
with the three:
ghci> runRegModuleOne (regEX (RSeq [(RClass False (S.singleton 'a')), (RClass False (S.singleton 'b'))])) "abab" 0 []
[((),2,["a","b"]),((),2,["a","b","a","b"]),((),2,["a","b","a","b","a","b"]),((),2,["a","b","a","b","a","b","a","b"])]
ghci> runRegModuleTwo (regEX (RSeq [(RClass False (S.singleton 'a')), (RClass False (S.singleton 'b'))])) "abab" 0 []
[((),2,["a","b","a","b","a","b","a","b"])]
ghci> runRegModuleThree (regEX (RSeq [(RClass False (S.singleton 'a')), (RClass False (S.singleton 'b'))])) "abab" 0 []
[]
ghci> runRegModuleThree_helper
[]
Note that "One" and "Two" seems to be successful after a fashion since they are expected to store the letters they match when they find a sequence of letters, it might have been even more correct had they had only 4 letters saved rather than 8, alghough I am unsure of that part, but suspect it is because 4 runs are being made of the RSeq
.
Further more "One" saves the history of all the runs.
Why does runRegModuleThree
seem to fail though, even when I leave out the regEX (RSeq []) = regfail
?
In comparison runRegModuleThree
seems to be the only correct version when using RAlt
:
ghci> runRegModuleOne (regEX (RAlt (RClass False (S.singleton 'a')) (RClass False (S.singleton 'b')))) "abab" 0 []
[((),1,["a"]),((),1,["a","b"]),((),1,["a","b","b"]),((),1,["a","b","b","b"])]
ghci> runRegModuleTwo (regEX (RAlt (RClass False (S.singleton 'a')) (RClass False (S.singleton 'b')))) "abab" 0 []
[((),1,["a","b","b","b"])]
ghci> runRegModuleThree (regEX (RAlt (RClass False (S.singleton 'a')) (RClass False (S.singleton 'b')))) "abab" 0 []
[((),4,["a","b","a","b"])]
--Other parcularities for runRegModuleOne and Two
ghci> runRegModuleOne (regEX (RAlt (RClass False (S.singleton 'a')) (RClass False (S.singleton 'b')))) "abaa" 0 []
[((),1,["a"]),((),1,["a","b"]),((),1,["a","b","b"]),((),1,["a","b","b","b"])]
ghci> runRegModuleTwo (regEX (RAlt (RClass False (S.singleton 'a')) (RClass False (S.singleton 'b')))) "abaa" 0 []
[((),1,["a","b","b","b"])]
Here it seems that "One" and "Two" stores a 'wrong' letter after t he first, whereas this isn't the case for "Three". For "One and "Two" this is also the case for the input "abaa"
.
But why doesn't this happen for "Three"? As I understand it the 'discard' doesn't discard the values within the Monad but just corresponds to m >> n = m >>= \_ -> n
such that no function is applied to the value within n
. In this sense the data stored in the monad due to putData
and putData (d ++ [[next]])
are not discarded, since it is a side-effect.
How can one make runRegModuleThree
work correctly for both cases of the regEx
patternmatch depicted here?
Upvotes: 1
Views: 147