Reputation: 1900
I'm really new to parsing in Haskell, but it's mostly making sense.
I'm working on building a Templating program mostly to learn parsing better; templates can interpolate values in via {{ value }}
notation.
Here's my current parser,
data Template a = Template [Either String a]
data Directive = Directive String
templateFromFile :: FilePath -> IO (Either ParseError (Template Directive))
templateFromFile = parseFromFile templateParser
templateParser :: Parser (Template Directive)
templateParser = do
tmp <- template
eof
return tmp
template :: Parser (Template Directive)
template = Template <$> many (dir <|> txt)
where
dir = Right <$> directive
txt = Left <$> many1 (anyChar <* notFollowedBy directive)
directive :: Parser Directive
directive = do
_ <- string "{{"
txt <- manyTill anyChar (string "}}")
return $ Directive txt
Then I run it on a file something like this:
{{ value }}
This is normal Text
{{ expression }}
When I run this using templateFromFile "./template.txt"
I get the error:
Left "./template.txt" (line 5, column 17):
unexpected Directive " expression "
Why is this happening and how can I fix it?
My basic understanding is that many1 (anyChar <* notFollowedBy directive)
should grab all of the characters up until the start of the next directive, then should fail and return the list of characters up till that point; then
it should fall back to the previous many
and should try parsing dir
again and should succeed; clearly something else is happening though. I'm
having trouble figuring out how to parse things between other things when
the parsers mostly overlap.
I'd love some tips on how to structure this all more idiomatically, please let me know if I'm doing something in a silly way. Cheers! Thanks for your time!
Upvotes: 1
Views: 359
Reputation: 3426
replace-megaparsec
is a library for doing search-and-replace with parsers. The
search-and-replace function is
streamEdit
,
which can find your {{ value }}
patterns and then substitute in some other text.
streamEdit
is built from a generalized version
of your template
function called
sepCap
.
import Replace.Megaparsec
import Text.Megaparsec
import Text.Megaparsec.Char
import Data.Char
input = unlines
[ "{{ value }}"
, ""
, "This is normal Text"
, ""
, "{{ expression }}"
]
directive :: Parsec Void String String
directive = do
_ <- string "{{"
txt <- manyTill anySingle (string "}}")
return txt
editor k = fmap toUpper k
streamEdit directive editor input
VALUE
This is normal Text
EXPRESSION
Upvotes: 1
Reputation: 50864
You have a couple of problems. First, in Parsec, if a parser consumes any input and then fails, that's an error. So, when the parser:
anyChar <* notFollowedBy directive
fails (because the character is followed by a directive), it fails after anyChar
has consumed input, and that generates an error immediately. Therefore, the parser:
let p1 = many1 (anyChar <* notFollowedBy directive)
will never succeed if it runs into a directive. For example:
parse p1 "" "okay" -- works
parse p1 "" "oops {{}}" -- will fail after consuming "oops "
You can fix this by inserting a try
clause:
let p2 = many1 (try (anyChar <* notFollowedBy directive))
parse p2 "" "okay {{}}"
which yields Right "okay"
and reveals the second problem. Parser p2
only consumes characters that aren't followed by a directive, so that excludes the space immediately before the directive, and you have no means in your parser to consume a character that is followed by a directive, so it gets stuck.
You actually want something like:
let p3 = many1 (notFollowedBy directive *> anyChar)
which first checks that, at the current position, we aren't looking at a directive before grabbing a character. No try
clause is needed because if this fails, it fails without consuming input. (notFollowedBy
never consumes input, as per the documentation.)
parse p3 "" "okay" -- returns Right "okay"
parse p3 "" "okay {{}}" -- return Right "okay "
parse p3 "" "{{fails}}" -- correctly fails w/o consuming input
So, taking your original example with:
txt = Left <$> many1 (notFollowedBy directive *> anyChar)
should work fine.
Upvotes: 4
Reputation: 33429
many1 (anyChar <* notFollowedBy directive)
This parses only characters not followed by a directive.
{{ value }}
This is normal Text
{{ expression }}
When parsing the text in the middle, it will stop at the last t
, leaving the newline before the directive unconsumed (because it's, well, a character followed by a directive), so the next iteration, you try to parse a directive and you fail. Then you retry txt
on that newline, the parser expects it not to be followed by a directive, but it finds one, hence the error.
Upvotes: 0