danidiaz
danidiaz

Reputation: 27771

Attoparsec: skipping up to (but not including) a multi-char delimiter

I have a string that can contain pretty much any character. Inside the string there is the delimiter {{{.

For example: afskjdfakjsdfkjas{{{fasdf.

Using attoparsec, what is the idiomatic way of writing a Parser () that skips all characters before {{{, but without consuming the {{{?

Upvotes: 5

Views: 705

Answers (2)

jub0bs
jub0bs

Reputation: 66183

Use attoparsec's lookAhead (which applies a parser without consuming any input) and manyTill to write a parser that consumes everything up to (but excluding) a {{{ delimiter. You're then free to apply that parser and throw its result away.

{-# LANGUAGE OverloadedStrings #-}

import Control.Applicative ( (<|>) )
import Data.Text ( Text )
import qualified Data.Text as T
import Data.Attoparsec.Text
import Data.Attoparsec.Combinator ( lookAhead, manyTill )

myParser :: Parser Text
myParser = T.concat <$> manyTill (nonOpBraceSpan <|> opBraceSpan)
                                 (lookAhead $ string "{{{")
                    <?> "{{{"
  where
    opBraceSpan    = takeWhile1 (== '{')
    nonOpBraceSpan = takeWhile1 (/= '{')

In GHCi:

λ> :set -XOverloadedStrings 
λ> parseTest myParser "{foo{{bar{{{baz"
Done "{{{baz" "{foo{{bar"

Upvotes: 3

Jeremy List
Jeremy List

Reputation: 1766

You can do it the slightly harder way like this:

foo = many $ do
  Just c <- fmap (const Nothing) (try $ string "{{{") <|> fmap Just anyChar
  return c

Or you could use this helper function manyTill like this:

foo = manyTill anyChar (try $ string "{{{")

Upvotes: 0

Related Questions