Reputation: 1323
I don't understand how to use the lexeme function
I have seen the above question, but I still don't understand.
The example in the documentation, for instance, also does not work.
mainParser = do{ whiteSpace
; ds <- many (lexeme digit)
; eof
; return (sum ds)
}
Upvotes: 4
Views: 2218
Reputation: 1077
Disclaimer: I am not expert in either Haskell or parsing. I have modified the above the code little bit
import Text.Parsec
import qualified Text.Parsec.Token as T
import Text.Parsec.String ( Parser )
import Text.Parsec.Language (haskellDef)
lexer = T.makeTokenParser haskellDef
whiteSpace :: Parser ()
whiteSpace = T.whiteSpace lexer
lexeme = T.lexeme lexer
mainParser = do whiteSpace
ds <- many digit
eof
return ds
Let's run the above code.
Mukeshs-MacBook-Pro:Compilers mukeshtiwari$ ghci stmp.hs
GHCi, version 7.6.1: http://www.haskell.org/ghc/ :? for help
Loading package ghc-prim ... linking ... done.
Loading package integer-gmp ... linking ... done.
Loading package base ... linking ... done.
[1 of 1] Compiling Main ( stmp.hs, interpreted )
Ok, modules loaded: Main.
*Main> parse mainParser "" "1"
Loading package array-0.4.0.1 ... linking ... done.
Loading package deepseq-1.3.0.1 ... linking ... done.
Loading package bytestring-0.10.0.0 ... linking ... done.
Loading package transformers-0.3.0.0 ... linking ... done.
Loading package mtl-2.1.2 ... linking ... done.
Loading package text-0.11.2.3 ... linking ... done.
Loading package parsec-3.1.3 ... linking ... done.
Right "1"
*Main> parse mainParser "" "12"
Right "12"
*Main> parse mainParser "" "123"
Right "123"
*Main> parse mainParser "" " 123"
Right "123"
*Main> parse mainParser "" " 123"
Right "123"
*Main> parse mainParser "" " 123"
Right "123"
Every thing looks good so far. Now we should try some more input.
*Main> parse mainParser "" "123 "
Left (line 1, column 4):
unexpected ' '
expecting digit or end of input
Oops! Some thing went wrong with our parser. Can you spot the difference in input ? Now if you have spotted the difference, you can see that there is space at the end in second case but how come this parser is able to handle the spaces before number literals ? Remember whiteSpace function, it eats all the spaces before the number literals and give the remaining input to rest of code ( many digit ) which keep consuming as many number literals as it can before encountering something which is not digit. Again the rest of input ( in our case remaining spaces ) is passed to eof so our parser complains about space. Can we ignore these spaces while reading the number literals ? We know that whiteSpace eats zero or more spaces so add it our code( Ignore <* for a moment ).
import Text.Parsec
import qualified Text.Parsec.Token as T
import Text.Parsec.String ( Parser )
import Text.Parsec.Language (haskellDef)
import Control.Applicative ( (<*) )
lexer = T.makeTokenParser haskellDef
whiteSpace :: Parser ()
whiteSpace = T.whiteSpace lexer
lexeme = T.lexeme lexer
mainParser = do whiteSpace
ds <- many ( digit <* whiteSpace )
eof
return ds
and after running this code
*Main> parse mainParser "" " 31312 "
Right "31312"
*Main> parse mainParser "" " 3131 2 "
Right "31312"
*Main> parse mainParser "" " 313 1 2 "
Right "31312"
*Main> parse mainParser "" " 3 1 3 1 2 "
Right "31312"
*Main> parse mainParser "" " 31 3 1 2 "
Right "31312"
Now it looks fine. Let's try to see how this code is able to handle the space. All initial spaces are taken by whiteSpace and remaining input is passed to next function ( many ( digit <* whiteSpace ) ). Here digit consumes a number literal and whiteSpace consumes zero or more space and result of this computation is result of digit. Looking at the documentation of lexeme, lexeme p first applies parser p and than the whiteSpace parser so lexeme digit will first consume a digit and then zero or more space.
Upvotes: 5
Reputation: 105935
Disclaimer: I haven't used Parsec
yet. That being said, lexeme
is a field of GenTokenParser s u m
. If you inspect it's type in GHCi, you'll end up with
lexeme :: GenTokenParser s u m -> ParsecT s u m a -> ParsecT s u m a
Therefore, you already need a generic token parser, which you can create with makeTokenParser
. The latter has the type:
makeTokenParser
:: Stream s m Char =>
Text.Parsec.Token.GenLanguageDef s u m
-> Text.Parsec.Token.GenTokenParser s u m
It takes a language definition and returns a token parser. Since you don't have any specific language in mind, you can use emptyDef
from Text.Parsec.Language
. Note that whiteSpace
also takes a GenTokenParser
. And last, in this setup you will end up with ds :: [Char]
, therefore you need to use digitToInt
from Data.Char
before you can actually sum your digits:
import Text.Parsec
import Text.Parsec.Token (lexeme, makeTokenParser, whiteSpace)
import Text.Parsec.Language (emptyDef)
import Data.Char (digitToInt)
lexer = makeTokenParser emptyDef
mainParser = do{ whiteSpace lexer
; ds <- many (lexeme lexer digit)
; eof
; return (sum . map digitToInt $ ds)
}
main = do
putStrLn "Please give some digits (whitespaces are ignored)"
line <- getLine
case parse mainParser "" line of
Right n -> putStrLn $ "Sum of digits is " ++ show n
Left _ -> putStrLn $ "Couldn't parse your line"
Example output:
*Main> :main Please give some digits 7 8 91 72 3945 01 92 Sum of digits is 67 *Main> :main Please give some digits abc 1 Couldn't parse your line
Upvotes: 5