Chris Penner
Chris Penner

Reputation: 1900

Parsec: Handling Overlapping Parsers

I'm really new to parsing in Haskell, but it's mostly making sense.

I'm working on building a Templating program mostly to learn parsing better; templates can interpolate values in via {{ value }} notation.

Here's my current parser,

data Template a = Template [Either String a]
data Directive = Directive String

templateFromFile :: FilePath -> IO (Either ParseError (Template Directive))
templateFromFile = parseFromFile templateParser

templateParser :: Parser (Template Directive)
templateParser = do
  tmp <- template
  eof
  return tmp

template :: Parser (Template Directive)
template = Template <$> many (dir <|> txt)
    where
      dir = Right <$> directive
      txt = Left <$> many1 (anyChar <* notFollowedBy directive)

directive :: Parser Directive
directive = do
  _ <- string "{{"
  txt <- manyTill anyChar (string "}}")
  return $ Directive txt

Then I run it on a file something like this:

{{ value }}

This is normal Text

{{ expression }}

When I run this using templateFromFile "./template.txt" I get the error:

Left "./template.txt" (line 5, column 17):
unexpected Directive " expression "

Why is this happening and how can I fix it?

My basic understanding is that many1 (anyChar <* notFollowedBy directive) should grab all of the characters up until the start of the next directive, then should fail and return the list of characters up till that point; then it should fall back to the previous many and should try parsing dir again and should succeed; clearly something else is happening though. I'm having trouble figuring out how to parse things between other things when the parsers mostly overlap.

I'd love some tips on how to structure this all more idiomatically, please let me know if I'm doing something in a silly way. Cheers! Thanks for your time!

Upvotes: 1

Views: 359

Answers (3)

James Brock
James Brock

Reputation: 3426

replace-megaparsec is a library for doing search-and-replace with parsers. The search-and-replace function is streamEdit, which can find your {{ value }} patterns and then substitute in some other text.

streamEdit is built from a generalized version of your template function called sepCap.

import Replace.Megaparsec
import Text.Megaparsec
import Text.Megaparsec.Char
import Data.Char

input = unlines
    [ "{{ value }}"
    , ""
    , "This is normal Text"
    , ""
    , "{{ expression }}"
    ]

directive :: Parsec Void String String
directive = do
    _ <- string "{{"
    txt <- manyTill anySingle (string "}}")
    return  txt

editor k = fmap toUpper k

streamEdit directive editor input
 VALUE 

This is normal Text

 EXPRESSION 

Upvotes: 1

K. A. Buhr
K. A. Buhr

Reputation: 50864

You have a couple of problems. First, in Parsec, if a parser consumes any input and then fails, that's an error. So, when the parser:

anyChar <* notFollowedBy directive

fails (because the character is followed by a directive), it fails after anyChar has consumed input, and that generates an error immediately. Therefore, the parser:

let p1 = many1 (anyChar <* notFollowedBy directive)

will never succeed if it runs into a directive. For example:

parse p1 "" "okay"   -- works
parse p1 "" "oops {{}}"  -- will fail after consuming "oops "

You can fix this by inserting a try clause:

let p2 = many1 (try (anyChar <* notFollowedBy directive))
parse p2 "" "okay {{}}"

which yields Right "okay" and reveals the second problem. Parser p2 only consumes characters that aren't followed by a directive, so that excludes the space immediately before the directive, and you have no means in your parser to consume a character that is followed by a directive, so it gets stuck.

You actually want something like:

let p3 = many1 (notFollowedBy directive *> anyChar)

which first checks that, at the current position, we aren't looking at a directive before grabbing a character. No try clause is needed because if this fails, it fails without consuming input. (notFollowedBy never consumes input, as per the documentation.)

parse p3 "" "okay" -- returns Right "okay"
parse p3 "" "okay {{}}" -- return Right "okay "
parse p3 "" "{{fails}}"  -- correctly fails w/o consuming input

So, taking your original example with:

txt = Left <$> many1 (notFollowedBy directive *> anyChar)

should work fine.

Upvotes: 4

Li-yao Xia
Li-yao Xia

Reputation: 33429

many1 (anyChar <* notFollowedBy directive)

This parses only characters not followed by a directive.

{{ value }}

This is normal Text

{{ expression }}

When parsing the text in the middle, it will stop at the last t, leaving the newline before the directive unconsumed (because it's, well, a character followed by a directive), so the next iteration, you try to parse a directive and you fail. Then you retry txt on that newline, the parser expects it not to be followed by a directive, but it finds one, hence the error.

Upvotes: 0

Related Questions