bdecaf
bdecaf

Reputation: 4732

Parsing string with parsec that needs to end with particular words?

I am working on some programming exercises. The one I am working on has following input format:

Give xxxxxxxxx as yyyy.

xxxxxxxx can be in several formats that repeatedly show up during these exercises. In particular its either binary (groups of 8 separated by spaces), hexadecimal (without spaces) or octal (groups of up to 3 numbers). I have already written parsers for these formats - however they all stumble over the "as". They looked like this

binaryParser = BinaryQuestion  <$> (count 8 ( oneOf "01") ) `sepBy1` space

I solved using this monstrosity (trimmed unnecessary code)

{-# LANGUAGE OverloadedStrings #-}
import Text.Parsec.ByteString
import Text.Parsec
import Text.Parsec.Char
import Data.ByteString.Char8 (pack, unpack, dropWhile, drop, snoc)
import qualified Data.ByteString as B 

data Input = BinaryQuestion [String] 
           | HexQuestion [String]
           | OctalQuestion [String]
  deriving Show
data Question = Question {input :: Input, target :: Target} deriving Show
data Target = Word deriving Show

test1 :: B.ByteString
test1 = "Give 01110100 01110101 01110010 01110100 01101100 01100101 as a word."
test2 :: B.ByteString
test2 = "Give 646f63746f72 as a word."
test3 :: B.ByteString
test3 = "Give 164 151 155 145 as a word."

targetParser :: Parser Target
targetParser = string "word" >> return Word

wrapAs :: Parser a -> Parser [a]
wrapAs kind = manyTill kind (try (string " as"))
inputParser :: Parser Input
inputParser = choice [try binaryParser, try (space >> hexParser), try octParser]
binaryParser :: Parser Input
binaryParser = BinaryQuestion  <$> wrapAs (space >> count 8 ( oneOf "01") )
hexParser :: Parser Input
hexParser = HexQuestion <$> wrapAs (count 2 hexDigit)
octParser :: Parser Input
octParser = OctalQuestion  <$> wrapAs (many1 space >> many1 (oneOf ['0'..'7']))

questionParser :: Parser Question
questionParser = do
  string "Give"
  inp <- inputParser 
  string " a "
  tar <- targetParser
  char '.'
  eof
  return $ Question inp tar

I don't like that I need to use the following string "as" inside the parsing of Input, and they generally are less readable. I mean using regex it would be trivial to have a trailing string. So I am not satisfied with my solution.

Is there a way I can reuse the 'nice' parsers - or at least use more readable parsers?

additional notes

The code I along the lines I wish I could get working would look like this:

{-# LANGUAGE OverloadedStrings #-}

import Text.Parsec.ByteString
import Text.Parsec
import Text.Parsec.Char
import Data.ByteString.Char8 (pack, unpack, dropWhile, drop, snoc)
import qualified Data.ByteString as B 

data Input = BinaryQuestion [String] 
           | HexQuestion [String]
           | OctalQuestion [String]
  deriving Show
data Question = Question {input :: Input, target :: Target} deriving Show
data Target = Word deriving Show

test1 :: B.ByteString
test1 = "Give 01110100 01110101 01110010 01110100 01101100 01100101 as a word."
test2 :: B.ByteString
test2 = "Give 646f63746f72 as a word."
test3 :: B.ByteString
test3 = "Give 164 151 155 145 as a word."

targetParser :: Parser Target
targetParser = string "word" >> return Word

inputParser :: Parser Input
inputParser = choice [try binaryParser, try hexParser, try octParser]
binaryParser :: Parser Input
binaryParser = BinaryQuestion  <$> count 8 ( oneOf "01") `sepBy1` space
hexParser :: Parser Input
hexParser = HexQuestion <$> many1 (count 2 hexDigit)
octParser :: Parser Input
octParser = OctalQuestion  <$>  (many1 (oneOf ['0'..'7'])) `sepBy1` space

questionParser :: Parser Question
questionParser = do
  string "Give"
  many1 space
  inp <- inputParser 
  many1 space
  string "as a"
  many1 space
  tar <- targetParser
  char '.'
  eof
  return $ Question inp tar

but parseTest questionParser test3 will return me parse error at (line 1, column 22): unexpected "a"

I suppose the problem is that space is used as separator inside the input but also comes in the as a string. I don't see any function inside parsec that would fit. In frustration I tried adding try in various places - however no success.

Upvotes: 1

Views: 904

Answers (2)

assembly.jc
assembly.jc

Reputation: 2066

EDIT:

As said in comment, the clean parser cannot be reused by Previouse solution stated at the end of this post.

It led to develop a small parser using Parsec to handle all the possible situations for end parsing of numeric string separated by space i.e.

  1. end with a space followed by non-required-digit character, e.g. "..11 as"
  2. end with a space, e.g. "..11 "
  3. end with eof, e.g. "..11"

and such a parser as below:

numParser:: (Parser Char->Parser String)->[Char]->Parser [String]
numParser repeatParser digits = 
    let digitParser = repeatParser $ oneOf digits
        endParser = (try $ lookAhead $ (space >> noneOf digits)) <|>
                    (try $ lookAhead $ (space <* eof))           <|> 
                    (eof >> return ' ')
    in do init <- digitParser
          rest <- manyTill (space >> digitParser) endParser
          return (init : rest)

And binaryParser and octParser need to be modified as below:

binaryParser = BinaryQuestion <$> numParser (count 8) "01"
octParser    = OctalQuestion  <$> numParser many1 ['0'..'7']

And Nothing need to change of questionParser stated in question, for reference, I state it again here:

questionParser = do
  string "Give"
  many1 space
  inp <- inputParser 
  many1 space       --no need change to many
  string "as a"
  many1 space     
  tar <- targetParser
  char '.'
  eof
  return $ Question inp tar

Previous Solution:

The functions endBy1 and many in Text.Parsec are helpful in this situation.

To replace sepBy1 by endBy1 as

binaryParser = BinaryQuestion  <$> count 8 ( oneOf "01") `endBy1` space

and

octParser = OctalQuestion  <$>  (many1 (oneOf ['0'..'7'])) `endBy1` space

Unlike sepBy1, endBy1 will read next some chars to determine whether end the parsing, and therefor, one space after the last digit will be consumed, i.e.

Give 164 151 155 145 as a word.
                    ^ this space will be consumed

So, instead of checking one or many space before "as a...", it need check zero or many space, so why use many function instead of many1, now the code become:

...
inp <- inputParser 
many space            -- change to many
string "as a"
.... 

Upvotes: 1

Yotam Ohad
Yotam Ohad

Reputation: 322

You are working with the pattern: Give {source} as a {target}. So you can pipe:

  • Parser for Give a
  • Parser for {source}
  • Parser for as a
  • Parser for {target}

No need to wrap the parser for {source} with the parser for as a.

Upvotes: 1

Related Questions