Reputation: 4732
I am working on some programming exercises. The one I am working on has following input format:
Give xxxxxxxxx as yyyy.
xxxxxxxx can be in several formats that repeatedly show up during these exercises. In particular its either binary (groups of 8 separated by spaces), hexadecimal (without spaces) or octal (groups of up to 3 numbers). I have already written parsers for these formats - however they all stumble over the "as". They looked like this
binaryParser = BinaryQuestion <$> (count 8 ( oneOf "01") ) `sepBy1` space
I solved using this monstrosity (trimmed unnecessary code)
{-# LANGUAGE OverloadedStrings #-}
import Text.Parsec.ByteString
import Text.Parsec
import Text.Parsec.Char
import Data.ByteString.Char8 (pack, unpack, dropWhile, drop, snoc)
import qualified Data.ByteString as B
data Input = BinaryQuestion [String]
| HexQuestion [String]
| OctalQuestion [String]
deriving Show
data Question = Question {input :: Input, target :: Target} deriving Show
data Target = Word deriving Show
test1 :: B.ByteString
test1 = "Give 01110100 01110101 01110010 01110100 01101100 01100101 as a word."
test2 :: B.ByteString
test2 = "Give 646f63746f72 as a word."
test3 :: B.ByteString
test3 = "Give 164 151 155 145 as a word."
targetParser :: Parser Target
targetParser = string "word" >> return Word
wrapAs :: Parser a -> Parser [a]
wrapAs kind = manyTill kind (try (string " as"))
inputParser :: Parser Input
inputParser = choice [try binaryParser, try (space >> hexParser), try octParser]
binaryParser :: Parser Input
binaryParser = BinaryQuestion <$> wrapAs (space >> count 8 ( oneOf "01") )
hexParser :: Parser Input
hexParser = HexQuestion <$> wrapAs (count 2 hexDigit)
octParser :: Parser Input
octParser = OctalQuestion <$> wrapAs (many1 space >> many1 (oneOf ['0'..'7']))
questionParser :: Parser Question
questionParser = do
string "Give"
inp <- inputParser
string " a "
tar <- targetParser
char '.'
eof
return $ Question inp tar
I don't like that I need to use the following string "as" inside the parsing of Input, and they generally are less readable. I mean using regex it would be trivial to have a trailing string. So I am not satisfied with my solution.
Is there a way I can reuse the 'nice' parsers - or at least use more readable parsers?
additional notes
The code I along the lines I wish I could get working would look like this:
{-# LANGUAGE OverloadedStrings #-}
import Text.Parsec.ByteString
import Text.Parsec
import Text.Parsec.Char
import Data.ByteString.Char8 (pack, unpack, dropWhile, drop, snoc)
import qualified Data.ByteString as B
data Input = BinaryQuestion [String]
| HexQuestion [String]
| OctalQuestion [String]
deriving Show
data Question = Question {input :: Input, target :: Target} deriving Show
data Target = Word deriving Show
test1 :: B.ByteString
test1 = "Give 01110100 01110101 01110010 01110100 01101100 01100101 as a word."
test2 :: B.ByteString
test2 = "Give 646f63746f72 as a word."
test3 :: B.ByteString
test3 = "Give 164 151 155 145 as a word."
targetParser :: Parser Target
targetParser = string "word" >> return Word
inputParser :: Parser Input
inputParser = choice [try binaryParser, try hexParser, try octParser]
binaryParser :: Parser Input
binaryParser = BinaryQuestion <$> count 8 ( oneOf "01") `sepBy1` space
hexParser :: Parser Input
hexParser = HexQuestion <$> many1 (count 2 hexDigit)
octParser :: Parser Input
octParser = OctalQuestion <$> (many1 (oneOf ['0'..'7'])) `sepBy1` space
questionParser :: Parser Question
questionParser = do
string "Give"
many1 space
inp <- inputParser
many1 space
string "as a"
many1 space
tar <- targetParser
char '.'
eof
return $ Question inp tar
but parseTest questionParser test3
will return me parse error at (line 1, column 22):
unexpected "a"
I suppose the problem is that space is used as separator inside the input but also comes in the as a
string. I don't see any function inside parsec that would fit. In frustration I tried adding try
in various places - however no success.
Upvotes: 1
Views: 904
Reputation: 2066
EDIT:
As said in comment, the clean parser cannot be reused by Previouse solution stated at the end of this post.
It led to develop a small parser using Parsec to handle all the possible situations for end parsing of numeric string separated by space i.e.
eof
, e.g. "..11"and such a parser as below:
numParser:: (Parser Char->Parser String)->[Char]->Parser [String]
numParser repeatParser digits =
let digitParser = repeatParser $ oneOf digits
endParser = (try $ lookAhead $ (space >> noneOf digits)) <|>
(try $ lookAhead $ (space <* eof)) <|>
(eof >> return ' ')
in do init <- digitParser
rest <- manyTill (space >> digitParser) endParser
return (init : rest)
And binaryParser
and octParser
need to be modified as below:
binaryParser = BinaryQuestion <$> numParser (count 8) "01"
octParser = OctalQuestion <$> numParser many1 ['0'..'7']
And Nothing need to change of questionParser stated in question, for reference, I state it again here:
questionParser = do
string "Give"
many1 space
inp <- inputParser
many1 space --no need change to many
string "as a"
many1 space
tar <- targetParser
char '.'
eof
return $ Question inp tar
Previous Solution:
The functions endBy1
and many
in Text.Parsec
are helpful in this situation.
To replace sepBy1
by endBy1
as
binaryParser = BinaryQuestion <$> count 8 ( oneOf "01") `endBy1` space
and
octParser = OctalQuestion <$> (many1 (oneOf ['0'..'7'])) `endBy1` space
Unlike sepBy1
, endBy1
will read next some chars to determine whether end the parsing, and therefor, one space after the last digit will be consumed, i.e.
Give 164 151 155 145 as a word.
^ this space will be consumed
So, instead of checking one or many space before "as a...", it need check zero or many space, so why use many
function instead of many1
, now the code become:
...
inp <- inputParser
many space -- change to many
string "as a"
....
Upvotes: 1
Reputation: 322
You are working with the pattern: Give {source} as a {target}
.
So you can pipe:
Give a
{source}
as a
{target}
No need to wrap the parser for {source}
with the parser for as a
.
Upvotes: 1